Search | Korea Science

Empirical Comparisons of Clustering Algorithms using Silhouette Information

Jun, Sung-Hae;Lee, Seung-Joo
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.10 no.1
- /
- pp.31-36
- /
- 2010
Many clustering algorithms have been used in diverse fields. When we need to group given data set into clusters, many clustering algorithms based on similarity or distance measures are considered. Most clustering works have been based on hierarchical and non-hierarchical clustering algorithms. Generally, for the clustering works, researchers have used clustering algorithms case by case from these algorithms. Also they have to determine proper clustering methods subjectively by their prior knowledge. In this paper, to solve the subjective problem of clustering we make empirical comparisons of popular clustering algorithms which are hierarchical and non hierarchical techniques using Silhouette measure. We use silhouette information to evaluate the clustering results such as the number of clusters and cluster variance. We verify our comparison study by experimental results using data sets from UCI machine learning repository. Therefore we are able to use efficient and objective clustering algorithms.
https://doi.org/10.5391/IJFIS.2010.10.1.031 인용 PDF KSCI

Environmental Survey Data Modeling Using K-means Clustering Techniques

Park, Hee-Chang;Cho, Kwang-Hyun
- Journal of the Korean Data and Information Science Society
- /
- v.16 no.3
- /
- pp.557-566
- /
- 2005
Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering Is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.
PDF

Environmental Survey Data Modeling using K-means Clustering Techniques

Park, Hee-Chang;Cho, Kwang-Hyun
- 한국데이터정보과학회:학술대회논문집
- /
- 2004.10a
- /
- pp.77-86
- /
- 2004
Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.
PDF

A Two-Stage Method for Near-Optimal Clustering (최적에 가까운 군집화를 위한 이단계 방법)

윤복식
- Journal of the Korean Operations Research and Management Science Society
- /
- v.29 no.1
- /
- pp.43-56
- /
- 2004
The purpose of clustering is to partition a set of objects into several clusters based on some appropriate similarity measure. In most cases, clustering is considered without any prior information on the number of clusters or the structure of the given data, which makes clustering is one example of very complicated combinatorial optimization problems. In this paper we propose a general-purpose clustering method that can determine the proper number of clusters as well as efficiently carry out clustering analysis for various types of data. The method is composed of two stages. In the first stage, two different hierarchical clustering methods are used to get a reasonably good clustering result, which is improved In the second stage by ASA(accelerated simulated annealing) algorithm equipped with specially designed perturbation schemes. Extensive experimental results are given to demonstrate the apparent usefulness of our ASA clustering method.
PDF KSCI

Medoid Determination in Deterministic Annealing-based Pairwise Clustering

Lee, Kyung-Mi;Lee, Keon-Myung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.3
- /
- pp.178-183
- /
- 2011
The deterministic annealing-based clustering algorithm is an EM-based algorithm which behaves like simulated annealing method, yet less sensitive to the initialization of parameters. Pairwise clustering is a kind of clustering technique to perform clustering with inter-entity distance information but not enforcing to have detailed attribute information. The pairwise deterministic annealing-based clustering algorithm repeatedly alternates the steps of estimation of mean-fields and the update of membership degrees of data objects to clusters until termination condition holds. Lacking of attribute value information, pairwise clustering algorithms do not explicitly determine the centroids or medoids of clusters in the course of clustering process or at the end of the process. This paper proposes a method to identify the medoids as the centers of formed clusters for the pairwise deterministic annealing-based clustering algorithm. Experimental results show that the proposed method locate meaningful medoids.
https://doi.org/10.5391/IJFIS.2011.11.3.178 인용 PDF KSCI

K-means Clustering for Environmental Indicator Survey Data

Park, Hee-Chang;Cho, Kwang-Hyun
- 한국데이터정보과학회:학술대회논문집
- /
- 2005.04a
- /
- pp.185-192
- /
- 2005
There are many data mining techniques such as association rule, decision tree, neural network analysis, clustering, genetic algorithm, bayesian network, memory-based reasoning, etc. We analyze 2003 Gyeongnam social indicator survey data using k-means clustering technique for environmental information. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper, we used k-means clustering of several clustering techniques. The k-means clustering is classified as a partitional clustering method. We can apply k-means clustering outputs to environmental preservation and environmental improvement.
PDF

Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders (합성곱 오토인코더 기반의 응집형 계층적 군집 분석)

Park, Nojin;Ko, Hanseok
- Journal of Korea Multimedia Society
- /
- v.23 no.1
- /
- pp.1-7
- /
- 2020
Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.
https://doi.org/10.9717/kmms.2020.23.1.001 인용 PDF KSCI HTML

Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

Do, Jin Hwan;Choi, Dong-Kug
- Molecules and Cells
- /
- v.25 no.2
- /
- pp.279-288
- /
- 2008
The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.
KSCI

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

Frigui, Hichem;Bchir, Ouiem;Baili, Naouel
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.13 no.4
- /
- pp.254-268
- /
- 2013
For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.
https://doi.org/10.5391/IJFIS.2013.13.4.254 인용 PDF KSCI

Spectral clustering based on the local similarity measure of shared neighbors

Cao, Zongqi;Chen, Hongjia;Wang, Xiang
- ETRI Journal
- /
- v.44 no.5
- /
- pp.769-779
- /
- 2022
Spectral clustering has become a typical and efficient clustering method used in a variety of applications. The critical step of spectral clustering is the similarity measurement, which largely determines the performance of the spectral clustering method. In this paper, we propose a novel spectral clustering algorithm based on the local similarity measure of shared neighbors. This similarity measurement exploits the local density information between data points based on the weight of the shared neighbors in a directed k-nearest neighbor graph with only one parameter k, that is, the number of nearest neighbors. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed algorithm outperforms other existing spectral clustering algorithms in terms of the clustering performance measured via the normalized mutual information, clustering accuracy, and F-measure. As an example, the proposed method can provide an improvement of 15.82% in the clustering performance for the Soybean dataset.
https://doi.org/10.4218/etrij.2021-0230 인용 PDF KSCI

Search Result 5,936, Processing Time 0.041 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)