Search | Korea Science

An Implementation of K-Means Algorithm Improving Cluster Centroids Decision Methodologies (클러스터 중심 결정 방법을 개선한 K-Means 알고리즘의 구현)

Lee Shin-Won;Oh HyungJin;An Dong-Un;Jeong Seong-Jong
- The KIPS Transactions:PartB
- /
- v.11B no.7 s.96
- /
- pp.867-874
- /
- 2004
K-Means algorithm is a non-hierarchical (plat) and reassignment techniques and iterates algorithm steps on the basis of K cluster centroids until the clustering results converge into K clusters. In its nature, K-Means algorithm has characteristics which make different results depending on the initial and new centroids. In this paper, we propose the modified K-Means algorithm which improves the initial and new centroids decision methodologies. By evaluating the performance of two algorithms using the 16 weighting scheme of SMART system, the modified algorithm showed $20{\%}$ better results on recall and F-measure than those of K-Means algorithm, and the document clustering results are quite improved.
https://doi.org/10.3745/KIPSTB.2004.11B.7.867 인용 PDF KSCI

Analysis of Document Clustering Varing Cluster Centroid Decisions (클러스터 중심 결정 방법에 따른 문서 클러스터링 성능 분석)

오형진;변동률;이신원;박순철;정성종;안동언
- Proceedings of the IEEK Conference
- /
- 2002.06c
- /
- pp.99-102
- /
- 2002
K-means clustering algorithm is a very popular clustering technique, which is used in the field of information retrieval. In this paper, We deal with the problem of K-means Algorithm from the view of creating the centroids and suggest a method reflecting document feature and considering the context of each document to determine the new centroids during the process of forming new centroids. For experiment, We used the automatic document summarizer to summarize the Reuter21578 newslire test dataset and achieved 20% improved results to the recall metrics.
PDF

An Ensemble Clustering Algorithm based on a Prior Knowledge (사전정보를 활용한 앙상블 클러스터링 알고리즘)

Ko, Song;Kim, Dae-Won
- Journal of KIISE:Software and Applications
- /
- v.36 no.2
- /
- pp.109-121
- /
- 2009
Although a prior knowledge is a factor to improve the clustering performance, it is dependant on how to use of them. Especial1y, when the prior knowledge is employed in constructing initial centroids of cluster groups, there should be concerned of similarities of a prior knowledge. Despite labels of some objects of a prior knowledge are identical, the objects whose similarities are low should be separated. By separating them, centroids of initial group were not fallen in a problem which is collision of objects with low similarities. There can use the separated prior knowledge by various methods such as various initializations. To apply association rule, proposed method makes enough cluster group number, then the centroids of initial groups could constructed by separated prior knowledge. Then ensemble of the various results outperforms what can not be separated.
PDF KSCI

A Study on Multi-Dimensional Entity Clustering Using the Objective Function of Centroids (중심체 목적함수를 이용한 다차원 개체 CLUSTERING 기법에 관한 연구)

Rhee, Chul;Kang, Suk-Ho
- Journal of the Korean Operations Research and Management Science Society
- /
- v.15 no.2
- /
- pp.1-15
- /
- 1990
A mathematical definition of the cluster is suggested. A nonlinear 0-1 integer programming formulation for the multi-dimensional entity clustering problem is developed. A heuristic method named MDEC (Multi-Dimensional Entity Clustering) using centroids and the binary partition is developed and the numerical examples are shown. This method has an advantage of providing bottle-neck entity informations.
PDF

Selection of Cluster Hierarchy Depth and Initial Centroids in Hierarchical Clustering using K-Means Algorithm (K-Means 알고리즘을 이용한 계층적 클러스터링에서 클러스터 계층 깊이와 초기값 선정)

Lee, Shin-Won;An, Dong-Un;Chong, Sung-Jong
- Journal of the Korean Society for information Management
- /
- v.21 no.4 s.54
- /
- pp.173-185
- /
- 2004
Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. In this paper, Condor system using K-Means algorithm Compares with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.
https://doi.org/10.3743/KOSIM.2004.21.4.173 인용 PDF

Approximate k values using Repulsive Force without Domain Knowledge in k-means

Kim, Jung-Jae;Ryu, Minwoo;Cha, Si-Ho
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.3
- /
- pp.976-990
- /
- 2020
The k-means algorithm is widely used in academia and industry due to easy and simple implementation, enabling fast learning for complex datasets. However, k-means struggles to classify datasets without prior knowledge of specific domains. We proposed the repulsive k-means (RK-means) algorithm in a previous study to improve the k-means algorithm, using the repulsive force concept, which allows deleting unnecessary cluster centroids. Accordingly, the RK-means enables to classifying of a dataset without domain knowledge. However, three main problems remain. The RK-means algorithm includes a cluster repulsive force offset, for clusters confined in other clusters, which can cause cluster locking; we were unable to prove RK-means provided optimal convergence in the previous study; and RK-means shown better performance only normalize term and weight. Therefore, this paper proposes the advanced RK-means (ARK-means) algorithm to resolve the RK-means problems. We establish an initialization strategy for deploying cluster centroids and define a metric for the ARK-means algorithm. Finally, we redefine the mass and normalize terms to close to the general dataset. We show ARK-means feasibility experimentally using blob and iris datasets. Experiment results verify the proposed ARK-means algorithm provides better performance than k-means, k'-means, and RK-means.
https://doi.org/10.3837/tiis.2020.03.004 인용 PDF KSCI HTML

A Fuzzy Clustering Algorithm for Clustering Categorical Data (범주형 데이터의 분류를 위한 퍼지 군집화 기법)

Kim, Dae-Won;Lee, Kwang-H.
- Journal of the Korean Institute of Intelligent Systems
- /
- v.13 no.6
- /
- pp.661-666
- /
- 2003
In this paper, the conventional k-modes and fuzzy k-modes algorithms for clustering categorical data is extended by representing the clusters of categorical data with fuzzy centroids instead of the hard-type centroids used in the original algorithm. The hard-type centroids of the traditional algorithms had difficulties in dealing with ambiguous boundary data, which might be misclassified and lead to thelocal optima. Use of fuzzy centroids makes it possible to fully exploit the power of fuzzy sets in representing the uncertainty in the classification of categorical data. The distance measure between data and fuzzy centroids is more precise and effective than those of the k-modes and fuzzy k-modes. To test the proposed approach, the proposed algorithm and two conventional algorithms were used to cluster three categorical data sets. The proposed method was found to give markedly better clustering results.
https://doi.org/10.5391/JKIIS.2003.13.6.661 인용 PDF KSCI

Double K-Means Clustering (이중 K-평균 군집화)

허명회
- The Korean Journal of Applied Statistics
- /
- v.13 no.2
- /
- pp.343-352
- /
- 2000
In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.
PDF

Analysis of Partial Discharge Pattern in XLPE/EDPM Interface Defect using the Cluster (군집화에 의한 XLPE/EPDM 계면결함 부분방전 패턴 분석)

Cho, Kyung-Soon;Lee, Kang-Won;Shin, Jong-Yeol;Hong, Jin-Woong
- Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
- /
- 2007.11a
- /
- pp.203-204
- /
- 2007
This paper investigated the influence on partial discharge distribution of various defects at the model power cable joints interface using K-means clustering. As the result of analyzing discharge number distribution of ${\Phi}-n$ cluster, clusters shifted to $0^{\circ}\;and\;180^{\circ}$ with increasing applying voltage. It was confirmed that discharge quantity and euclidean distance between centroids were increased with applying voltage from the analyzing centroid distribution of ${\Phi}-q$ cluster. The degree of dispersion was increased with calculating standard deviation of ${\Phi}-q$ cluster centroid. The tendency both number of discharge and mean value of ${\Phi}-q$ cluster centroid were some different with defect types.
PDF

Cluster Analysis of Incomplete Microarray Data with Fuzzy Clustering

Kim, Dae-Won
- Journal of the Korean Institute of Intelligent Systems
- /
- v.17 no.3
- /
- pp.397-402
- /
- 2007
In this paper, we present a method for clustering incomplete Microarray data using alternating optimization in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster Information such as cluster centroids and all available non-missing values in each iteration. The clustering results of the proposed method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data.
https://doi.org/10.5391/JKIIS.2007.17.3.397 인용 PDF KSCI

Search Result 21, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)