Search | Korea Science

Geodesic Clustering for Covariance Matrices

Lee, Haesung;Ahn, Hyun-Jung;Kim, Kwang-Rae;Kim, Peter T.;Koo, Ja-Yong
- Communications for Statistical Applications and Methods
- /
- v.22 no.4
- /
- pp.321-331
- /
- 2015
The K-means clustering algorithm is a popular and widely used method for clustering. For covariance matrices, we consider a geodesic clustering algorithm based on the K-means clustering framework in consideration of symmetric positive definite matrices as a Riemannian (non-Euclidean) manifold. This paper considers a geodesic clustering algorithm for data consisting of symmetric positive definite (SPD) matrices, utilizing the Riemannian geometric structure for SPD matrices and the idea of a K-means clustering algorithm. A K-means clustering algorithm is divided into two main steps for which we need a dissimilarity measure between two matrix data points and a way of computing centroids for observations in clusters. In order to use the Riemannian structure, we adopt the geodesic distance and the intrinsic mean for symmetric positive definite matrices. We demonstrate our proposed method through simulations as well as application to real financial data.
https://doi.org/10.5351/CSAM.2015.22.4.321 인용 PDF KSCI

Data Clustering Method Using a Modified Gaussian Kernel Metric and Kernel PCA

Lee, Hansung;Yoo, Jang-Hee;Park, Daihee
- ETRI Journal
- /
- v.36 no.3
- /
- pp.333-342
- /
- 2014
Most hyper-ellipsoidal clustering (HEC) approaches use the Mahalanobis distance as a distance metric. It has been proven that HEC, under this condition, cannot be realized since the cost function of partitional clustering is a constant. We demonstrate that HEC with a modified Gaussian kernel metric can be interpreted as a problem of finding condensed ellipsoidal clusters (with respect to the volumes and densities of the clusters) and propose a practical HEC algorithm that is able to efficiently handle clusters that are ellipsoidal in shape and that are of different size and density. We then try to refine the HEC algorithm by utilizing ellipsoids defined on the kernel feature space to deal with more complex-shaped clusters. The proposed methods lead to a significant improvement in the clustering results over K-means algorithm, fuzzy C-means algorithm, GMM-EM algorithm, and HEC algorithm based on minimum-volume ellipsoids using Mahalanobis distance.
https://doi.org/10.4218/etrij.14.0113.0553 인용 PDF KSCI KPUBS

A Mixed Co-clustering Algorithm Based on Information Bottleneck

Liu, Yongli;Duan, Tianyi;Wan, Xing;Chao, Hao
- Journal of Information Processing Systems
- /
- v.13 no.6
- /
- pp.1467-1486
- /
- 2017
Fuzzy co-clustering is sensitive to noise data. To overcome this noise sensitivity defect, possibilistic clustering relaxes the constraints in FCM-type fuzzy (co-)clustering. In this paper, we introduce a new possibilistic fuzzy co-clustering algorithm based on information bottleneck (ibPFCC). This algorithm combines fuzzy co-clustering and possibilistic clustering, and formulates an objective function which includes a distance function that employs information bottleneck theory to measure the distance between feature data point and feature cluster centroid. Many experiments were conducted on three datasets and one artificial dataset. Experimental results show that ibPFCC is better than such prominent fuzzy (co-)clustering algorithms as FCM, FCCM, RFCC and FCCI, in terms of accuracy and robustness.
https://doi.org/10.3745/JIPS.01.0019 인용 PDF KSCI

An Improved Automated Spectral Clustering Algorithm

Xiaodan Lv
- Journal of Information Processing Systems
- /
- v.20 no.2
- /
- pp.185-199
- /
- 2024
In this paper, an improved automated spectral clustering (IASC) algorithm is proposed to address the limitations of the traditional spectral clustering (TSC) algorithm, particularly its inability to automatically determine the number of clusters. Firstly, a cluster number evaluation factor based on the optimal clustering principle is proposed. By iterating through different k values, the value corresponding to the largest evaluation factor was selected as the first-rank number of clusters. Secondly, the IASC algorithm adopts a density-sensitive distance to measure the similarity between the sample points. This rendered a high similarity to the data distributed in the same high-density area. Thirdly, to improve clustering accuracy, the IASC algorithm uses the cosine angle classification method instead of K-means to classify the eigenvectors. Six algorithms-K-means, fuzzy C-means, TSC, EIGENGAP, DBSCAN, and density peak-were compared with the proposed algorithm on six datasets. The results show that the IASC algorithm not only automatically determines the number of clusters but also obtains better clustering accuracy on both synthetic and UCI datasets.
https://doi.org/10.3745/JIPS.04.0307 인용 PDF

Nonparametric analysis of income distributions among different regions based on energy distance with applications to China Health and Nutrition Survey data

Ma, Zhihua;Xue, Yishu;Hu, Guanyu
- Communications for Statistical Applications and Methods
- /
- v.26 no.1
- /
- pp.57-67
- /
- 2019
Income distribution is a major concern in economic theory. In regional economics, it is often of interest to compare income distributions in different regions. Traditional methods often compare the income inequality of different regions by assuming parametric forms of the income distributions, or using summary statistics like the Gini coefficient. In this paper, we propose a nonparametric procedure to test for heterogeneity in income distributions among different regions, and a K-means clustering procedure for clustering income distributions based on energy distance. In simulation studies, it is shown that the energy distance based method has competitive results with other common methods in hypothesis testing, and the energy distance based clustering method performs well in the clustering problem. The proposed approaches are applied in analyzing data from China Health and Nutrition Survey 2011. The results indicate that there are significant differences among income distributions of the 12 provinces in the dataset. After applying a 4-means clustering algorithm, we obtained the clustering results of the income distributions in the 12 provinces.
https://doi.org/10.29220/CSAM.2019.26.1.057 인용 PDF KSCI

COUNTING OF FLOWERS BASED ON K-MEANS CLUSTERING AND WATERSHED SEGMENTATION

PAN ZHAO;BYEONG-CHUN SHIN
- Journal of the Korean Society for Industrial and Applied Mathematics
- /
- v.27 no.2
- /
- pp.146-159
- /
- 2023
This paper proposes a hybrid algorithm combining K-means clustering and watershed algorithms for flower segmentation and counting. We use the K-means clustering algorithm to obtain the main colors in a complex background according to the cluster centers and then take a color space transformation to extract pixel values for the hue, saturation, and value of flower color. Next, we apply the threshold segmentation technique to segment flowers precisely and obtain the binary image of flowers. Based on this, we take the Euclidean distance transformation to obtain the distance map and apply it to find the local maxima of the connected components. Afterward, the proposed algorithm adaptively determines a minimum distance between each peak and apply it to label connected components using the watershed segmentation with eight-connectivity. On a dataset of 30 images, the test results reveal that the proposed method is more efficient and precise for the counting of overlapped flowers ignoring the degree of overlap, number of overlap, and relatively irregular shape.
https://doi.org/10.12941/jksiam.2023.27.146 인용 PDF

Medoid Determination in Deterministic Annealing-based Pairwise Clustering

Lee, Kyung-Mi;Lee, Keon-Myung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.3
- /
- pp.178-183
- /
- 2011
The deterministic annealing-based clustering algorithm is an EM-based algorithm which behaves like simulated annealing method, yet less sensitive to the initialization of parameters. Pairwise clustering is a kind of clustering technique to perform clustering with inter-entity distance information but not enforcing to have detailed attribute information. The pairwise deterministic annealing-based clustering algorithm repeatedly alternates the steps of estimation of mean-fields and the update of membership degrees of data objects to clusters until termination condition holds. Lacking of attribute value information, pairwise clustering algorithms do not explicitly determine the centroids or medoids of clusters in the course of clustering process or at the end of the process. This paper proposes a method to identify the medoids as the centers of formed clusters for the pairwise deterministic annealing-based clustering algorithm. Experimental results show that the proposed method locate meaningful medoids.
https://doi.org/10.5391/IJFIS.2011.11.3.178 인용 PDF KSCI

Density Based Spatial Clustering Method Considering Obstruction (장애물을 고려한 밀도 기반의 공간 클러스터링 기법)

임현숙;김호숙;용환승;이상호;박승수
- Journal of Korea Multimedia Society
- /
- v.6 no.3
- /
- pp.375-383
- /
- 2003
Clustering in spatial mining is to group similar objects based on their distance, connectivity or their relative density in space. In the real world. there exist many physical objects such as rivers, lakes and highways, and their presence may affect the result of clustering. In this paper, we define distance to handle obstacles, and using that we propose the density based clustering algorithm called DBSCAN-O to handle obstacles. We show that DBSCAN-O produce different clustering results from previous density based clustering algorithm DBSCAN by our experiment result.
PDF

A Density Peak Clustering Algorithm Based on Information Bottleneck

Yongli Liu;Congcong Zhao;Hao Chao
- Journal of Information Processing Systems
- /
- v.19 no.6
- /
- pp.778-790
- /
- 2023
Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.
https://doi.org/10.3745/JIPS.04.0294 인용 PDF

Image Segmentation Based on the Fuzzy Clustering Algorithm using Average Intracluster Distance (평균내부거리를 적용한 퍼지 클러스터링 알고리즘에 의한 영상분할)

You, Hyu-Jai;Ahn, Kang-Sik;Cho, Seok-Je
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.9
- /
- pp.3029-3036
- /
- 2000
Image segmentation is one of the important processes in the image information extraction for computer vision systems. The fuzzy clustering methods have been extensively used in the image segmentation because it extracts feature information of the region. Most of fuzzy clustering methods have used the Fuzzy C-means(FCM) algorithm. This algorithm can be misclassified about the different size of cluster because the degree of membership depends on highly the distance between data and the centroids of the clusters. This paper proposes a fuzzy clustering algorithm using the Average Intracluster Distance that classifies data uniformly without regard to the size of data sets. The Average Intracluster Distance takes an average of the vector set belong to each cluster and increases in exact proportion to its size and density. The experimental results demonstrate that the proposed approach has the g
PDF

Search Result 130, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)