• Title, Summary, Keyword: Clustering

Search Result 5,183, Processing Time 0.052 seconds

A Post Web Document Clustering Algorithm (후처리 웹 문서 클러스터링 알고리즘)

  • Im, Yeong-Hui
    • The KIPS Transactions:PartB
    • /
    • v.9B no.1
    • /
    • pp.7-16
    • /
    • 2002
  • The Post-clustering algorithms, which cluster the results of Web search engine, have several different requirements from conventional clustering algorithms. In this paper, we propose the new post-clustering algorithm satisfying those requirements as many as possible. The proposed Concept ART is the form of combining the concept vector that have several advantages in document clustering with Fuzzy ART known as real-time clustering algorithms. Moreover we show that it is applicable to general-purpose clustering as well as post-clustering

Heuristic algorithm to raise efficiency in clustering (군집의 효율향상을 위한 휴리스틱 알고리즘)

  • Lee, Seog-Hwan;Park, Seung-Hun
    • Journal of the Korea Safety Management and Science
    • /
    • v.11 no.3
    • /
    • pp.157-166
    • /
    • 2009
  • In this study, we developed a heuristic algorithm to get better efficiency of clustering than conventional algorithms. Conventional clustering algorithm had lower efficiency of clustering as there were no solid method for selecting initial center of cluster and as they had difficulty in search solution for clustering. EMC(Expanded Moving Center) heuristic algorithm was suggested to clear the problem of low efficiency in clustering. We developed algorithm to select initial center of cluster and search solution systematically in clustering. Experiments of clustering are performed to evaluate performance of EMC heuristic algorithm. Squared-error of EMC heuristic algorithm showed better performance for real case study and improved greatly with increase of cluster number than the other ones.

Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure (시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링)

  • 오승준;김재련
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • /
    • pp.221-229
    • /
    • 2004
  • Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a method for clustering such sequence datasets. The similarity between sequences must be decided before clustering the sequences. This study proposes a new similarity measure to compute the similarity between two sequences using a sequence element. Two clustering algorithms using the proposed similarity measure are proposed: a hierarchical clustering algorithm and a scalable clustering algorithm that uses sampling and a k-nearest neighbor method. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed clustering algorithms is better than that of clusters produced by traditional clustering algorithms.

  • PDF

Web Image Clustering with Text Features and Measuring its Efficiency

  • Cho, Soo-Sun
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.6
    • /
    • pp.699-706
    • /
    • 2007
  • This article is an approach to improving the clustering of Web images by using high-level semantic features from text information relevant to Web images as well as low-level visual features of image itself. These high-level text features can be obtained from image URLs and file names, page titles, hyperlinks, and surrounding text. As a clustering algorithm, a self-organizing map (SOM) proposed by Kohonen is used. To evaluate the clustering efficiencies of SOMs, we propose a simple but effective measure indicating the accumulativeness of same class images and the perplexities of class distributions. Our approach is to advance the existing measures through defining and using new measures accumulativeness on the most superior clustering node and concentricity to evaluate clustering efficiencies of SOMs. The experimental results show that the high-level text features are more useful in SOM-based Web image clustering.

  • PDF

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • /
    • pp.59-69
    • /
    • 2005
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF

Refinement of Document Clustering by Using NMF

  • Shinnou, Hiroyuki;Sasaki, Minoru
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • /
    • pp.430-439
    • /
    • 2007
  • In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cut (Mcut), which is a powerful spectral clustering method, and then refine the result via NMF. Finally we should obtain an accurate clustering result. However, NMF often fails to improve the given clustering result. To overcome this problem, we use the Mcut object function to stop the iteration of NMF.

  • PDF

Zone-Based Self-Organized Clustering with Byzantine Agreement in MANET

  • Sung, Soon-Hwa
    • Journal of Communications and Networks
    • /
    • v.10 no.2
    • /
    • pp.221-227
    • /
    • 2008
  • The proposed zone-based self-organized clustering broadcasts neighbor information to only a zone with the same ID. Besides, the zone-based self-organized clustering with unique IDs can communicate securely even if the state transition of nodes in zone-based self-organized clustering is threatened by corrupted nodes. For this security, the Byzantine agreement protocol with proactive asynchronous verifiable secret sharing (AVSS) is considered. As a result of simulation, an efficiency and a security of the proposed clustering are better than those of a traditional clustering. Therefore, this paper describes a new and extended self-organized clustering that securely seeks to minimize the interference in mobile ad hoc networks (MANETs).

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

  • Kongwudhikunakorn, Supavit;Waiyamai, Kitsana
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.277-300
    • /
    • 2020
  • This paper presents a method for clustering short text documents, such as news headlines, social media statuses, or instant messages. Due to the characteristics of these documents, which are usually short and sparse, an appropriate technique is required to discover hidden knowledge. The objective of this paper is to identify the combination of document representation, document distance, and document clustering that yields the best clustering quality. Document representations are expanded by external knowledge sources represented by a Distributed Representation. To cluster documents, a K-means partitioning-based clustering technique is applied, where the similarities of documents are measured by word mover's distance. To validate the effectiveness of the proposed method, experiments were conducted to compare the clustering quality against several leading methods. The proposed method produced clusters of documents that resulted in higher precision, recall, F1-score, and adjusted Rand index for both real-world and standard data sets. Furthermore, manual inspection of the clustering results was conducted to observe the efficacy of the proposed method. The topics of each document cluster are undoubtedly reflected by members in the cluster.

Analysis of Massive Scholarly Keywords using Inverted-Index based Bottom-up Clustering (역인덱스 기반 상향식 군집화 기법을 이용한 대규모 학술 핵심어 분석)

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.758-764
    • /
    • 2018
  • Digital documents such as patents, scholarly papers and research reports have author keywords which summarize the topics of documents. Different documents are likely to describe the same topic if they share the same keywords. Document clustering aims at clustering documents to similar topics with an unsupervised learning method. However, it is difficult to apply to a large amount of documents event though the document clustering is utilized to in various data analysis due to computational complexity. In this case, we can cluster and connect massive documents using keywords efficiently. Existing bottom-up hierarchical clustering requires huge computation and time complexity for clustering a large number of keywords. This paper proposes an inverted index based bottom-up clustering for keywords and analyzes the results of clustering with massive keywords extracted from scholarly papers and research reports.

Implementation of a Top-down Clustering Protocol for Wireless Sensor Networks (무선 네트워크를 위한 하향식 클러스터링 프로토콜의 구현)

  • Yun, Phil-Jung;Kim, Sang-Kyung;Kim, Chang-Hwa
    • Journal of Information Technology Services
    • /
    • v.9 no.3
    • /
    • pp.95-106
    • /
    • 2010
  • Many researches have been performed to increase energy-efficiency in wireless sensor networks. One of primary research topics is about clustering protocols, which are adopted to configure sensor networks in the form of hierarchical structures by grouping sensor nodes into a cluster. However, legacy clustering protocols do not propose detailed methods from the perspective of implementation to determine a cluster's boundary and configure a cluster, and to communicate among clusters. Moreover, many of them involve assumptions inappropriate to apply those to a sensor field. In this paper, we have designed and implemented a new T-Clustering (Top-down Clustering) protocol, which takes into considerations a node's density, a distance between cluster heads, and remained energy of a node all together. Our proposal is a sink-node oriented top-down clustering protocol, and can form uniform clusters throughout the network. Further, it provides re-clustering functions according to the state of a network. In order to verify our protocol's feasibility, we have implemented and experimented T-Clustering protocol on Crossbow's MICAz nodes which are executed on TinyOS 2.0.2.