• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 548, Processing Time 0.024 seconds

Analysis of Document Clustering Varing Cluster Centroid Decisions (클러스터 중심 결정 방법에 따른 문서 클러스터링 성능 분석)

  • 오형진;변동률;이신원;박순철;정성종;안동언
    • Proceedings of the IEEK Conference
    • /
    • 2002.06c
    • /
    • pp.99-102
    • /
    • 2002
  • K-means clustering algorithm is a very popular clustering technique, which is used in the field of information retrieval. In this paper, We deal with the problem of K-means Algorithm from the view of creating the centroids and suggest a method reflecting document feature and considering the context of each document to determine the new centroids during the process of forming new centroids. For experiment, We used the automatic document summarizer to summarize the Reuter21578 newslire test dataset and achieved 20% improved results to the recall metrics.

  • PDF

A Study on Optimizing the Number of Clusters using External Cluster Relationship Criterion (외부 군집 연관 기준 정보를 이용한 군집수 최적화)

  • Lee, Hyun-Jin;Jee, Tae-Chang
    • Journal of Digital Contents Society
    • /
    • v.12 no.3
    • /
    • pp.339-345
    • /
    • 2011
  • The k-means has been one of the popular, simple and faster clustering algorithms, but the right value of k is unknown. The value of k (the number of clusters) is a very important element because the result of clustering is different depending on it. In this paper, we present a novel algorithm based on an external cluster relationship criterion which is an evaluation metric of clustering result to determine the number of clusters dynamically. Experimental results show that our algorithm is superior to other methods in terms of the accuracy of the number of clusters.

K-means Clustering using a Grid-based Sampling

  • Park, Hee-Chang;Lee, Sun-Myung
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.249-258
    • /
    • 2003
  • K-means clustering has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters that we want, because it is more primitive, explorative. In this paper we propose a new method of k-means clustering using the grid-based sample. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF

K-means Clustering using a Grid-based Representatives

  • Park, Hee-Chang;Lee, Sun-Myung
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.229-238
    • /
    • 2003
  • K-means clustering has been widely used in many applications, such that pattern analysis, data analysis, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters, because it is more primitive and explorative. In this paper we propose a new method of k-means clustering using the grid-based representative value(arithmetic and trimmed mean) for sample. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF

Differentially Private k-Means Clustering based on Dynamic Space Partitioning using a Quad-Tree (쿼드 트리를 이용한 동적 공간 분할 기반 차분 프라이버시 k-평균 클러스터링 알고리즘)

  • Goo, Hanjun;Jung, Woohwan;Oh, Seongwoong;Kwon, Suyong;Shim, Kyuseok
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.288-293
    • /
    • 2018
  • There have recently been several studies investigating how to apply a privacy preserving technique to publish data. Differential privacy can protect personal information regardless of an attacker's background knowledge by adding probabilistic noise to the original data. To perform differentially private k-means clustering, the existing algorithm builds a differentially private histogram and performs the k-means clustering. Since it constructs an equi-width histogram without considering the distribution of data, there are many buckets to which noise should be added. We propose a k-means clustering algorithm using a quad-tree that captures the distribution of data by using a small number of buckets. Our experiments show that the proposed algorithm shows better performance than the existing algorithm.

The Enhancement of Learning Time in Fuzzy c-means algorithm (학습시간을 개선한 Fuzzy c-means 알고리즘)

  • 김형철;조제황
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2001.06a
    • /
    • pp.113-116
    • /
    • 2001
  • The conventional K-means algorithm is widely used in vector quantizer design and clustering analysis. Recently modified K-means algorithm has been proposed where the codevector updating step is as fallows: new codevector = current codevector + scale factor (new centroid - current codevector). This algorithm uses a fixed value for the scale factor. In this paper, we propose a new algorithm for the enhancement of learning time in fuzzy c-means a1gorithm. Experimental results show that the proposed method produces codebooks about 5 to 6 times faster than the conventional K-means algorithm with almost the same Performance.

  • PDF

새로운 모형기반 군집분석 알고리즘

  • Park, Jeong-Su;Hwang, Hyeon-Sik
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.97-100
    • /
    • 2005
  • A new model-based clustering algorithm is proposed. The idea starts from the assumption that observations are realizations of Gaussian processes and so are correlated. With a special covariance structure, the posterior probability that an observation belongs to each cluster is computed using the ECM algorithm. A preliminary result of small-scale simulation study is given to compare with the k-means clustering algorithms.

  • PDF

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

Fusion of Background Subtraction and Clustering Techniques for Shadow Suppression in Video Sequences

  • Chowdhury, Anuva;Shin, Jung-Pil;Chong, Ui-Pil
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.4
    • /
    • pp.231-234
    • /
    • 2013
  • This paper introduces a mixture of background subtraction technique and K-Means clustering algorithm for removing shadows from video sequences. Lighting conditions cause an issue with segmentation. The proposed method can successfully eradicate artifacts associated with lighting changes such as highlight and reflection, and cast shadows of moving object from segmentation. In this paper, K-Means clustering algorithm is applied to the foreground, which is initially fragmented by background subtraction technique. The estimated shadow region is then superimposed on the background to eliminate the effects that cause redundancy in object detection. Simulation results depict that the proposed approach is capable of removing shadows and reflections from moving objects with an accuracy of more than 95% in every cases considered.

Classification of Volatile Chemicals using Fuzzy Clustering Algorithm (퍼지 Clustering 알고리즘을 이용한 휘발성 화학물질의 분류)

  • Byun, Hyung-Gi;Kim, Kab-Il
    • Proceedings of the KIEE Conference
    • /
    • 1996.07b
    • /
    • pp.1042-1044
    • /
    • 1996
  • The use of fuzzy theory in task of pattern recognition may be applicable gases and odours classification and recognition. This paper reports results obtained from fuzzy c-means algorithms to patterns generated by odour sensing system using an array of conducting polymer sensors, for volatile chemicals. For the volatile chemicals clustering problem, the three unsupervise fuzzy c-means algorithms were applied. From among the pattern clustering methods, the FCMAW algorithm, which updated the cluster centres more frequently, consistently outperformed. It has been confirmed as an outstanding clustering algorithm throughout experimental trials.

  • PDF