• Title/Summary/Keyword: K-Means 알고리즘

Search Result 770, Processing Time 0.027 seconds

Extensions of X-means with Efficient Learning the Number of Clusters (X-means 확장을 통한 효율적인 집단 개수의 결정)

  • Heo, Gyeong-Yong;Woo, Young-Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.4
    • /
    • pp.772-780
    • /
    • 2008
  • K-means is one of the simplest unsupervised learning algorithms that solve the clustering problem. However K-means suffers the basic shortcoming: the number of clusters k has to be known in advance. In this paper, we propose extensions of X-means, which can estimate the number of clusters using Bayesian information criterion(BIC). We introduce two different versions of algorithm: modified X-means(MX-means) and generalized X-means(GX-means), which employ one full covariance matrix for one cluster and so can estimate the number of clusters efficiently without severe over-fitting which X-means suffers due to its spherical cluster assumption. The algorithms start with one cluster and try to split a cluster iteratively to maximize the BIC score. The former uses K-means algorithm to find a set of optimal clusters with current k, which makes it simple and fast. However it generates wrongly estimated centers when the clusters are overlapped. The latter uses EM algorithm to estimate the parameters and generates more stable clusters even when the clusters are overlapped. Experiments with synthetic data show that the purposed methods can provide a robust estimate of the number of clusters and cluster parameters compared to other existing top-down algorithms.

Fault Detection of Ceramic Imaging using K-means Algorithm (K-means 알고리즘을 이용한 세라믹 영상에서의 결함 검출)

  • Kim, Kwang Beak;Woo, Young Woon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.275-277
    • /
    • 2014
  • 본 논문에서는 세라믹 소재 영상에 가우시안 필터링 기법을 적용하여 잡음을 제거하고, K-means 알고리즘을 적용하여 결함 영역을 세분화 한 뒤, 세분화된 결함 영역에 Max-Min 이진화 기법을 이용하여 결함 영역을 추출한 후, 형태학적 기법을 이용하여 잡음을 제거하고 결함을 추출한다. 제안된 방법을 세라믹 소재 영상을 대상으로 실험한 결과, 기존의 방법보다 효율적으로 결함이 검출되는 것을 확인하였다.

  • PDF

Selection of Cluster Hierarchy Depth and Initial Centroids in Hierarchical Clustering using K-Means Algorithm (K-Means 알고리즘을 이용한 계층적 클러스터링에서 클러스터 계층 깊이와 초기값 선정)

  • Lee, Shin-Won;An, Dong-Un;Chong, Sung-Jong
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.4 s.54
    • /
    • pp.173-185
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. In this paper, Condor system using K-Means algorithm Compares with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

A Study of Similar Blog Recommendation System Using Termite Colony Algorithm (흰개미 군집 알고리즘을 이용한 유사 블로그 추천 시스템에 관한 연구)

  • Jeong, Gi Sung;Jo, I-Seok;Lee, Malrey
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.83-88
    • /
    • 2013
  • This paper proposes a recommending system of the similar blogs gathered with similarities between blogs according to the similarity, dividing words, for each frequency, that individual blogs have. It improved the algorithm of k-means, using the model of the habits of white ants for better performance of clustering, and showed better performance of clustering as a result of evaluating and comparing with the existing algorithm of k-means as the improved algorithm. The recommending system of similar blog was designed and embodied, using the improved algorithm. TCA can reduce clustering time and the number of moving time for clustering compare with K-means algorithm.

A New Fast EM Algorithm (새로운 고속 EM 알고리즘)

  • 김성수;강지혜
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.10
    • /
    • pp.575-587
    • /
    • 2004
  • In this paper. a new Fast Expectation-Maximization algorithm(FEM) is proposed. Firstly the K-means algorithm is modified to reduce the number of iterations for finding the initial values that are used as the initial values in EM process. Conventionally the Initial values in K-means clustering are chosen randomly. which sometimes forces the process of clustering converge to some undesired center points. Uniform partitioning method is added to the conventional K-means to extract the proper initial points for each clusters. Secondly the effect of posterior probability is emphasized such that the application of Maximum Likelihood Posterior(MLP) yields fast convergence. The proposed FEM strengthens the characteristics of conventional EM by reinforcing the speed of convergence. The superiority of FEM is demonstrated in experimental results by presenting the improvement results of EM and accelerating the speed of convergence in parameter estimation procedures.

A Fast K-means and Fuzzy-c-means Algorithms using Adaptively Initialization (적응적인 초기치 설정을 이용한 Fast K-means 및 Frizzy-c-means 알고리즘)

  • 강지혜;김성수
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.516-524
    • /
    • 2004
  • In this paper, the initial value problem in clustering using K-means or Fuzzy-c-means is considered to reduce the number of iterations. Conventionally the initial values in clustering using K-means or Fuzzy-c-means are chosen randomly, which sometimes brings the results that the process of clustering converges to undesired center points. The choice of intial value has been one of the well-known subjects to be solved. The system of clustering using K-means or Fuzzy-c-means is sensitive to the choice of intial values. As an approach to the problem, the uniform partitioning method is employed to extract the optimal initial point for each clustering of data. Experimental results are presented to demonstrate the superiority of the proposed method, which reduces the number of iterations for the central points of clustering groups.

Creation of Frequent Patterns using K-means Algorithm for Data Mining Preprocess (데이터 마이닝의 전처리를 위한 K-means 알고리즘을 이용한 빈발패턴 생성)

  • Heui-Jong Yoo;Chi-Yeon Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.336-339
    • /
    • 2008
  • 우리가 사용하는 데이터베이스 내에는 많은 양의 데이터 들이 들어 있으며, 계속적으로 그 양은 늘어나고 있다. 이러한 데이터들로부터 질의를 통해 얻을 수 있는 기본적이고 단순한 정보들과 달리 고급 정보를 얻게 해주는 방법이 데이터 마이닝이다. 데이터 마이닝의 기법 중에서 본 논문에서는 k-means 알고리즘을 사용하여 트랜잭션을 클러스터링 함으로써 데이터베이스의 트랜잭션 수를 줄여 연관규칙의 대표적인 알고리즘인 Apriori 알고리즘의 단점인 트랜잭션 스캔으로 인한 성능 저하를 개선하고자 한다.

KMSVOD: Support Vector Data Description using K-means Clustering (KMSVDD: K-means Clustering을 이용한 Support Vector Data Description)

  • Kim, Pyo-Jae;Chang, Hyung-Jin;Song, Dong-Sung;Choi, Jin-Young
    • Proceedings of the KIEE Conference
    • /
    • 2006.04a
    • /
    • pp.90-92
    • /
    • 2006
  • 기존의 Support Vector Data Description (SVDD) 방법은 학습 데이터의 개수가 증가함에 따라 학습 시간이 지수 함수적으로 증가하므로, 대량의 데이터를 학습하는 데에는 한계가 있었다. 본 논문에서는 학습 속도를 빠르게 하기 위해 K-means clustering 알고리즘을 이용하는 SVDD 알고리즘을 제안하고자 한다. 제안된 알고리즘은 기존의 decomposition 방법과 유사하게 K-means clustering 알고리즘을 이용하여 학습 데이터 영역을 sub-grouping한 후 각각의 sub-group들을 개별적으로 학습함으로써 계산량 감소 효과를 얻는다. 이러한 sub-grouping 과정은 hypersphere를 이용하여 학습 데이터를 둘러싸는 SVDD의 학습 특성을 훼손시키지 않으면서 중심점으로 모여진 작은 영역의 학습 데이터를 학습하도록 함으로써, 기존의 SVDD와 비교하여 학습 정확도의 차이 없이 빠른 학습을 가능하게 한다. 다양한 데이터들을 이용한 모의실험을 통하여 그 효과를 검증하도록 한다.

  • PDF

A Study on Initial Seeds Selection of K-Means for Big Data Clustering (빅데이터 클러스터링을 위한 K-Means 초기 중심 선정 연구)

  • Kim, Yeong-Ju;Heo, Yu-Gyeong;Back, Jong-Sang;Jeong, Hwan-Jong;Lee, Sung-Ro;Jung, Min-A
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.750-752
    • /
    • 2014
  • K-Means 알고리즘은 구현이 쉽고, 패턴수가 n일 때 시간 복잡도가 O(n)인 장점을 가져 대용량 데이터에서 널리 이용된다. 그러나, K-Means 알고리즘은 초기 클러스터 중심을 어떻게 선정하는가에 따라 할당-재계산 횟수, 클러스터링 결과를 결정짓는다. 본 논문에서는 K-Means 알고리즘에서 클러스터 초기 중심 선정 연구를 살펴보고 계통임의추출법을 적용하여 K-Means 초기 중심 선정 방법을 제안한다. 제안한 방법은 대용량 데이터의 클러스터링 시간을 감소하고 정확도를 향상시킬 수 있다.

A Novel Approach towards use of Adaptive Multiple Kernels in Interval Type-2 Possibilistic Fuzzy C-Means (적응적 Multiple Kernels을 이용한 Interval Type-2 Possibilistic Fuzzy C-Means 방법)

  • Joo, Won-Hee;Rhee, Frank Chung-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.529-535
    • /
    • 2014
  • In this paper, we propose a hybrid approach towards multiple kernels interval type-2 possibilistic fuzzy C-means(PFCM) based on interval type-2 possibilistic fuzzy c-means(IT2PFCM) and possibilistic fuzzy c-means using multiple kernels( PFCM-MK). In case of noisy data or overlapping cluster prototypes, fuzzy C-means gives poor performance in comparison to possibilistic fuzzy C-means(PFCM). Moreover, to address the uncertainty associated with fuzzifier parameter m, interval type-2 possibilistic fuzzy C-means(PFCM) is used. Most of the practical data available are complex and non-linearly separable. In such cases using Gaussian kernels proves helpful. Therefore, in order to overcome all these issues, we have integrated multiple kernels possibilistic fuzzy C-means(PFCM) into interval type-2 possibilistic fuzzy C-means(IT2PFCM) and propose the idea of multiple kernels based interval type-2 possibilistic fuzzy C-means(IT2PFCM-MK).