• Title/Summary/Keyword: Cluster Validity Index

Search Result 26, Processing Time 0.039 seconds

A Cluster validity Index for Fuzzy Clustering

  • Lee, Haiyoung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.9 no.6
    • /
    • pp.621-626
    • /
    • 1999
  • In this paper a new cluster validation index which is heuristic but able to eliminate the monotonically decreasing tendency occurring in which the number of cluster c gets very large and close to the number of data points n is proposed. We review the FCM algorithm and some conventional cluster validity criteria discuss on the limiting behavior of the proposed validity index and provide some numerical examples showing the effectiveness of the proposed cluster validity index.

  • PDF

Fast Search Algorithm for Determining the Optimal Number of Clusters using Cluster Validity Index (클러스터 타당성 평가기준을 이용한 최적의 클러스터 수 결정을 위한 고속 탐색 알고리즘)

  • Lee, Sang-Wook
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.9
    • /
    • pp.80-89
    • /
    • 2009
  • A fast and efficient search algorithm to determine an optimal number of clusters in clustering algorithms is presented. The method is based on cluster validity index which is a measure for clustering optimality. As the clustering procedure progresses and reaches an optimal cluster configuration, the cluster validity index is expected to be minimized or maximized. In this Paper, a fast non-exhaustive search method for finding the optimal number of clusters is designed and shown to work well in clustering. The proposed algorithm is implemented with the k-mean++ algorithm as underlying clustering techniques using CB and PBM as a cluster validity index. Experimental results show that the proposed method provides the computation time efficiency without loss of accuracy on several artificial and real-life data sets.

A new cluster validity index based on connectivity in self-organizing map (자기조직화지도에서 연결강도에 기반한 새로운 군집타당성지수)

  • Kim, Sangmin;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.591-601
    • /
    • 2020
  • The self-organizing map (SOM) is a unsupervised learning method projecting high-dimensional data into low-dimensional nodes. It can visualize data in 2 or 3 dimensional space using the nodes and it is available to explore characteristics of data through the nodes. To understand the structure of data, cluster analysis is often used for nodes obtained from SOM. In cluster analysis, the optimal number of clusters is one of important issues. To help to determine it, various cluster validity indexes have been developed and they can be applied to clustering outcomes for nodes from SOM. However, while SOM has an advantage in that it reflects the topological properties of original data in the low-dimensional space, these indexes do not consider it. Thus, we propose a new cluster validity index for SOM based on connectivity between nodes which considers topological properties of data. The performance of the proposed index is evaluated through simulations and it is compared with various existing cluster validity indexes.

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

A Cluster Validity Index Using Overlap and Separation Measures Between Fuzzy Clusters (클러스터간 중첩성과 분리성을 이용한 퍼지 분할의 평가 기법)

  • Kim, Dae-Won;Lee, Kwang-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.455-460
    • /
    • 2003
  • A new cluster validity index is proposed that determines the optimal partition and optimal number of clusters for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index exploits an overlap measure and a separation measure between clusters. The overlap measure is obtained by computing an inter-cluster overlap. The separation measure is obtained by computing a distance between fuzzy clusters. A good fuzzy partition is expected to have a low degree of overlap and a larger separation distance. Testing of the proposed index and nine previously formulated indexes on well-known data sets showed the superior effectiveness and reliability of the proposed index in comparison to other indexes.

Nearest neighbor and validity-based clustering

  • Son, Seo H.;Seo, Suk T.;Kwon, Soon H.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.3
    • /
    • pp.337-340
    • /
    • 2004
  • The clustering problem can be formulated as the problem to find the number of clusters and a partition matrix from a given data set using the iterative or non-iterative algorithms. The author proposes a nearest neighbor and validity-based clustering algorithm where each data point in the data set is linked with the nearest neighbor data point to form initial clusters and then a cluster in the initial clusters is linked with the nearest neighbor cluster to form a new cluster. The linking between clusters is continued until no more linking is possible. An optimal set of clusters is identified by using the conventional cluster validity index. Experimental results on well-known data sets are provided to show the effectiveness of the proposed clustering algorithm.

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data (고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교)

  • Jeong, Yun-Kyoung;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.167-181
    • /
    • 2007
  • Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.

Reproducibility Assessment of K-Means Clustering and Applications (K-평균 군집화의 재현성 평가 및 응용)

  • 허명회;이용구
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.135-144
    • /
    • 2004
  • We propose a reproducibility (validity) assessment procedure of K-means cluster analysis by randomly partitioning the data set into three parts, of which two subsets are used for developing clustering rules and one subset for testing consistency of clustering rules. Also, as an alternative to Rand index and corrected Rand index, we propose an entropy-based consistency measure between two clustering rules, and apply it to determination of the number of clusters in K-means clustering.

Validity and Reliability Tests of Neonatal Patient Classification System Based on Nursing Needs (간호요구 정도에 의한 신생아중환자 분류도구의 타당도 및 신뢰도 검증)

  • Ko, Bum Ja;Yu, Mi;Kang, Jin Sun;Kim, Dong Yeon;Bog, Jeong Hee
    • Journal of Korean Clinical Nursing Research
    • /
    • v.18 no.3
    • /
    • pp.354-367
    • /
    • 2012
  • Purpose: This study was done to verify validity and reliability of a neonatal patient classification system (NeoPCS-1). Methods: An expert group of 8 nurse managers and 40 nurses from 8 Neonatal Intensive Care Units in Korea, verified content validity of the measurement using item level content validity index (I-CVI). The participants were nurses caring for 469 neonates. Data were collected from November 11 to December 14, 2011 and analyzed using descriptive statistics, ANOVA, intraclass correlation coefficient, and K-cluster analysis with PASW 18.0 program. Results: Nursing domains and activities included 8 items with 91 activities. I-CVI was above .80 in all areas. Interrater reliability was significant between two raters (r=.95, p<.001). Classification scores for participants according to patient types and nurses' intuition were significantly higher for the following patients; gestational age (${\leq}29$ weeks), body weight (<1,000 gm), and transfer from hospital. Six groups were classified using cluster analysis method based on nursing needs. Patient classification scores were significantly different for the groups. Conclusion: These results show adequate validity and reliability for the NeoPCS-1 based on nursing needs. Study is needed to refine the measurement and develop index scores to estimate number of nurses needed for adequate neonatal care.

A fuzzy cluster validity index for the evaluation of Fuzzy C-Means algorithm (최적 클러스터 분할을 위한 FCM 평가 인덱스)

  • 김대원;이광현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.374-376
    • /
    • 2003
  • 본 논문에서는 Fussy C-Means (FCM) 알고리즘에 의해 계산된 퍼지 클러스터들에 대한 평가 인덱스를 제안한다. 제안된 인덱스는 퍼지 클러스터들간의 인접성(inter-cluster proximity)을 이용한다. 클러스터 인접성을 도입함으로써 클러스터간의 중첩 정도를 계산할 수 있다. 따라서, 인접성 값이 낮을수록 클러스터들은 공간에 잘 분포하게 됨을 알 수 있다. 다양한 데이터 집합에 대한 실험을 통해서 제안된 인덱스의 효율성과 신뢰성을 검증하였다.

  • PDF