• Title/Summary/Keyword: 계층적 클러스터링 알고리즘

Search Result 100, Processing Time 0.027 seconds

K-means Clustering Method according to Documentation Numbers (문서 수에 따른 가중치를 적용한 K-means 문서 클러스터링)

  • Cho, Cea-Sung;An, Dong-Un;Jeong, Sung-Jong;Lee, Shin-Won
    • Annual Conference of KIPS
    • /
    • 2003.05a
    • /
    • pp.345-348
    • /
    • 2003
  • 본 논문에서는 이 문서 클러스터링 방법 중 계층적 방법인 Kmeans 클러스터링 알고리즘을 이용하여 문서를 클러스터링 하고자 한다 기존의 Kmeans 클러스터링 알고리즘은 문서의 수가 많을 경우 하나의 클러스터링에 너무 많은 문서들이 할당되는 문제점이 있다. 이 치우침을 완화하고자 각 클러스터링에 할당된 문서 수에 따라서 문서에 가중치를 부여한 후 다시 클러스터링을 하는 방법을 제안하였다. 실험 결과는 정확률, 재현율을 결합한 조화 평균(F-measure)를 사용하여 평가하였으며 기존 알고리즘보다 9%이상의 성능 향상을 나타냈다.

  • PDF

Clustering Characteristics and Class Hierarchy Generation in Object-Oriented Development (객체지향개발에서의 속성 클러스터링과 클래스 계층구조생성)

  • Lee Gun Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1443-1450
    • /
    • 2004
  • The clustering characteristics for a number of classes, and defining the inheritance relations between the classes is a difficult and complex problem in an early stage of object oriented software development. We discuss a traditional iterative approach for the reuse of the existing classes in a library and an integrated approach to creating a number of new classes presented in this study. This paper formulates a character-istic clustering problem for zero-one integer programming and presents a network solution method with illustrative examples and the basic rules to define the inheritance relations between the classes. The network solution method for a characteristic clustering problem is based on a distance parameter between every pair of objects with characteristics. We apply the approach to a real problem taken from industry.

Top-down Hierarchical Clustering using Multidimensional Indexes (다차원 색인을 이용한 하향식 계층 클러스터링)

  • Hwang, Jae-Jun;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.367-380
    • /
    • 2002
  • Due to recent increase in applications requiring huge amount of data such as spatial data analysis and image analysis, clustering on large databases has been actively studied. In a hierarchical clustering method, a tree representing hierarchical decomposition of the database is first created, and then, used for efficient clustering. Existing hierarchical clustering methods mainly adopted the bottom-up approach, which creates a tree from the bottom to the topmost level of the hierarchy. These bottom-up methods require at least one scan over the entire database in order to build the tree and need to search most nodes of the tree since the clustering algorithm starts from the leaf level. In this paper, we propose a novel top-down hierarchical clustering method that uses multidimensional indexes that are already maintained in most database applications. Generally, multidimensional indexes have the clustering property storing similar objects in the same (or adjacent) data pares. Using this property we can find adjacent objects without calculating distances among them. We first formally define the cluster based on the density of objects. For the definition, we propose the concept of the region contrast partition based on the density of the region. To speed up the clustering algorithm, we use the branch-and-bound algorithm. We propose the bounds and formally prove their correctness. Experimental results show that the proposed method is at least as effective in quality of clustering as BIRCH, a bottom-up hierarchical clustering method, while reducing the number of page accesses by up to 26~187 times depending on the size of the database. As a result, we believe that the proposed method significantly improves the clustering performance in large databases and is practically usable in various database applications.

Ant Colony Hierarchical Cluster Analysis (개미 군락 시스템을 이용한 계층적 클러스터 분석)

  • Kang, Mun-Su;Choi, Young-Sik
    • Journal of Internet Computing and Services
    • /
    • v.15 no.5
    • /
    • pp.95-105
    • /
    • 2014
  • In this paper, we present a novel ant-based hierarchical clustering algorithm, where ants repeatedly hop from one node to another over a weighted directed graph of k-nearest neighborhood obtained from a given dataset. We introduce a notion of node pheromone, which is the summation of amount of pheromone on incoming arcs to a node. The node pheromone can be regarded as a relative density measure in a local region. After a finite number of ants' hopping, we remove nodes with a small amount of node pheromone from the directed graph, and obtain a group of strongly connected components as clusters. We iteratively do this removing process from a low value of threshold to a high value, yielding a hierarchy of clusters. We demonstrate the performance of the proposed algorithm with synthetic and real data sets, comparing with traditional clustering methods. Experimental results show the superiority of the proposed method to the traditional methods.

A Hierarchy of Kernel PCM-Generated Clusters (계층적인 구조를 이루는 KPCM 알고리즘)

  • Koo Yang-Hyup;Choi Byung-ln;Rhee Chung-Hoon
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.83-86
    • /
    • 2005
  • 커널함수를 이용한 클러스터링 방법은 일반적인 목적함수 기반의 클러스터링 방법에 비해 고리모양과 같은 복잡한 모양의 데이터를 클러스터링할 때 훨씬 효율적이다. 그러나, 커널기반의 클러스터링 방법은 거리함수를 계산하기 위하여 커널함수를 연산해야 하기 때문에 클러스터 수가 많아지면, 일반적인 목적함수 기반의 클러스터링 방법에 비하여 계산량이 급격히 증가하는 단점이 있다. 따라서, 본 논문에서는 이러한 단점을 개선하기 위하여 커널기반의 클러스터링 기법에 계층적인 클러스터링 모델을 적용한다.

  • PDF

A Neuro-Fuzzy Modeling using the Hierarchical Clustering and Gaussian Mixture Model (계층적 클러스터링과 Gaussian Mixture Model을 이용한 뉴로-퍼지 모델링)

  • Kim, Sung-Suk;Kwak, Keun-Chang;Ryu, Jeong-Woong;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.5
    • /
    • pp.512-519
    • /
    • 2003
  • In this paper, we propose a neuro-fuzzy modeling to improve the performance using the hierarchical clustering and Gaussian Mixture Model(GMM). The hierarchical clustering algorithm has a property of producing unique parameters for the given data because it does not use the object function to perform the clustering. After optimizing the obtained parameters using the GMM, we apply them as initial parameters for Adaptive Network-based Fuzzy Inference System. Here, the number of fuzzy rules becomes to the cluster numbers. From this, we can improve the performance index and reduce the number of rules simultaneously. The proposed method is verified by applying to a neuro-fuzzy modeling for Box-Jenkins s gas furnace data and Sugeno's nonlinear system, which yields better results than previous oiles.

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

  • Kim, Man-Sun;Lee, Sang-Yong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.1
    • /
    • pp.33-40
    • /
    • 2003
  • Data mining plays an important role in a knowledge discovery process and various algorithms of data mining can be selected for the specific purpose. Most of traditional hierachical clustering methode are suitable for processing small data sets, so they difficulties in handling large data sets because of limited resources and insufficient efficiency. In this study we propose a hybrid neural networks clustering technique, called PPC for Pre-Post Clustering that can be applied to large data sets and find unknown patterns. PPC combinds an artificial intelligence method, SOM and a statistical method, hierarchical clustering technique, and clusters data through two processes. In pre-clustering process, PPC digests large data sets using SOM. Then in post-clustering, PPC measures Similarity values according to cohesive distances which show inner features, and adjacent distances which show external distances between clusters. At last PPC clusters large data sets using the simularity values. Experiment with UCI repository data showed that PPC had better cohensive values than the other clustering techniques.

i-LEACH : Head-node Constrained Clustering Algorithm for Randomly-Deployed WSN (i-LEACH : 랜덤배치 고정형 WSN에서 헤더수 고정 클러스터링 알고리즘)

  • Kim, Chang-Joon;Lee, Doo-Wan;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.1
    • /
    • pp.198-204
    • /
    • 2012
  • Generally, the clustering of sensor nodes in WSN is a useful mechanism that helps to cope with scalability problem and, if combined with network data aggregation, may increase the energy efficiency of the network. The Hierarchical clustering routing algorithm is a typical algorithm for enhancing overall energy efficiency of network, which selects cluster-head in order to send the aggregated data arriving from the node in cluster to a base station. In this paper, we propose the improved-LEACH that uses comparably simple and light-weighted policy to select cluster-head nodes, which results in reduction of the clustering overhead and overall power consumption of network. By using fine-grained power model, the simulation results show that i-LEACH can reduce clustering overhead compared with the well-known previous works such as LEACH. As result, i-LEACH algorithm and LEACH algorithm was compared, network power-consumption of i-LEACH algorithm was improved than LEACH algorithm with 25%, and network-traffic was improved 16%.

Heuristic Algorithm for High-Speed Clustering of Neighbor Vehicular Position Coordinate (주변 차량 위치 좌표의 고속 클러스터링을 위한 휴리스틱 알고리즘)

  • Choi, Yoon-Ho;Yoo, Seung-Ho;Seo, Seung-Woo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.4
    • /
    • pp.343-350
    • /
    • 2014
  • Divisive hierarchical clustering algorithms iterate the process of decomposition and clustering data recursively. In each recursive call, data in each cluster are arbitrarily selected and thus, the total clustering time can be increased, which causes a problem that it is difficult to apply the process of clustering neighbor vehicular position data in vehicular localization. In this paper, we propose a new heuristic algorithm for speeding up the clustering time by eliminating randomness of the selected data in the process of generating the initial divisive clusters.