Search | Korea Science

Fast Search Algorithm for Determining the Optimal Number of Clusters using Cluster Validity Index (클러스터 타당성 평가기준을 이용한 최적의 클러스터 수 결정을 위한 고속 탐색 알고리즘)

Lee, Sang-Wook
- The Journal of the Korea Contents Association
- /
- v.9 no.9
- /
- pp.80-89
- /
- 2009
A fast and efficient search algorithm to determine an optimal number of clusters in clustering algorithms is presented. The method is based on cluster validity index which is a measure for clustering optimality. As the clustering procedure progresses and reaches an optimal cluster configuration, the cluster validity index is expected to be minimized or maximized. In this Paper, a fast non-exhaustive search method for finding the optimal number of clusters is designed and shown to work well in clustering. The proposed algorithm is implemented with the k-mean++ algorithm as underlying clustering techniques using CB and PBM as a cluster validity index. Experimental results show that the proposed method provides the computation time efficiency without loss of accuracy on several artificial and real-life data sets.
https://doi.org/10.5392/JKCA.2009.9.9.080 인용 PDF

Performance Comparison of Clustering Validity Indices with Business Applications (경영사례를 이용한 군집화 유효성 지수의 성능비교)

Lee, Soo-Hyun;Jeong, Youngseon;Kim, Jae-Yun
- Journal of the Korean Operations Research and Management Science Society
- /
- v.41 no.2
- /
- pp.17-33
- /
- 2016
Clustering is one of the leading methods to analyze big data and is used in many different fields. This study deals with Clustering Validity Index (CVI) to verify the effectiveness of clustering results. We compare the performance of CVIs with business applications of various field. In this study, the used CVIs for comparing performance are DU, CH, DB, SVDU, SVCH, and SVDB. The first three CVIs are well-known ones in the existing research and the last three CVIs are based on support vector data description. It has been verified with outstanding performance and qualified as the application ability of CVIs based on support vector data description.
https://doi.org/10.7737/JKORMS.2016.41.2.017 인용 PDF

VS-FCM: Validity-guided Spatial Fuzzy c-Means Clustering for Image Segmentation

Kang, Bo-Yeong;Kim, Dae-Won
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.10 no.1
- /
- pp.89-93
- /
- 2010
In this paper a new fuzzy clustering approach to the color clustering problem has been proposed. To deal with the limitations of the traditional FCM algorithm, we propose a spatial homogeneity-based FCM algorithm. Moreover, the cluster validity index is employed to automatically determine the number of clusters for a given image. We refer to this method as VS-FCM algorithm. The effectiveness of the proposed method is demonstrated through various clustering examples.
https://doi.org/10.5391/IJFIS.2010.10.1.089 인용 PDF KSCI

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
- Asia pacific journal of information systems
- /
- v.16 no.1
- /
- pp.127-144
- /
- 2006
The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.
PDF KSCI

A Cluster validity Index for Fuzzy Clustering

Lee, Haiyoung
- Journal of the Korean Institute of Intelligent Systems
- /
- v.9 no.6
- /
- pp.621-626
- /
- 1999
In this paper a new cluster validation index which is heuristic but able to eliminate the monotonically decreasing tendency occurring in which the number of cluster c gets very large and close to the number of data points n is proposed. We review the FCM algorithm and some conventional cluster validity criteria discuss on the limiting behavior of the proposed validity index and provide some numerical examples showing the effectiveness of the proposed cluster validity index.
PDF

Reproducibility Assessment of K-Means Clustering and Applications (K-평균 군집화의 재현성 평가 및 응용)

허명회;이용구
- The Korean Journal of Applied Statistics
- /
- v.17 no.1
- /
- pp.135-144
- /
- 2004
We propose a reproducibility (validity) assessment procedure of K-means cluster analysis by randomly partitioning the data set into three parts, of which two subsets are used for developing clustering rules and one subset for testing consistency of clustering rules. Also, as an alternative to Rand index and corrected Rand index, we propose an entropy-based consistency measure between two clustering rules, and apply it to determination of the number of clusters in K-means clustering.
https://doi.org/10.5351/KJAS.2004.17.1.135 인용 PDF KSCI

Nearest neighbor and validity-based clustering

Son, Seo H.;Seo, Suk T.;Kwon, Soon H.
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.4 no.3
- /
- pp.337-340
- /
- 2004
The clustering problem can be formulated as the problem to find the number of clusters and a partition matrix from a given data set using the iterative or non-iterative algorithms. The author proposes a nearest neighbor and validity-based clustering algorithm where each data point in the data set is linked with the nearest neighbor data point to form initial clusters and then a cluster in the initial clusters is linked with the nearest neighbor cluster to form a new cluster. The linking between clusters is continued until no more linking is possible. An optimal set of clusters is identified by using the conventional cluster validity index. Experimental results on well-known data sets are provided to show the effectiveness of the proposed clustering algorithm.
https://doi.org/10.5391/IJFIS.2004.4.3.337 인용 PDF KSCI

Comparison of time series clustering methods and application to power consumption pattern clustering

Kim, Jaehwi;Kim, Jaehee
- Communications for Statistical Applications and Methods
- /
- v.27 no.6
- /
- pp.589-602
- /
- 2020
The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.
https://doi.org/10.29220/CSAM.2020.27.6.589 인용 PDF KSCI

Comparison of clustering with yeast microarray gene expression data (효모 마이크로어레이 유전자발현 데이터에 대한 군집화 비교)

Lee, Kyung-A;Kim, Jae-Hee
- Journal of the Korean Data and Information Science Society
- /
- v.22 no.4
- /
- pp.741-753
- /
- 2011
We accomplish clustering analyses for yeast cell cycle microarray expression data. We compare model-based clustering, K-means, PAM, SOM and hierarchical Ward method with yeast data. As the validity measure for clustering results, connectivity, Dunn Index and silhouette values are computed and compared.
PDF KSCI

Clustering load patterns recorded from advanced metering infrastructure (AMI로부터 측정된 전력사용데이터에 대한 군집 분석)

Ann, Hyojung;Lim, Yaeji
- The Korean Journal of Applied Statistics
- /
- v.34 no.6
- /
- pp.969-977
- /
- 2021
We cluster the electricity consumption of households in A-apartment in Seoul, Korea using Hierarchical K-means clustering algorithm. The data is recorded from the advanced metering infrastructure (AMI), and we focus on the electricity consumption during evening weekdays in summer. Compare to the conventional clustering algorithms, Hierarchical K-means clustering algorithm is recently applied to the electricity usage data, and it can identify usage patterns while reducing dimension. We apply Hierarchical K-means algorithm to the AMI data, and compare the results based on the various clustering validity indexes. The results show that the electricity usage patterns are well-identified, and it is expected to be utilized as a major basis for future applications in various fields.
https://doi.org/10.5351/KJAS.2021.34.6.969 인용 PDF KSCI

Search Result 24, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)