• Title/Summary/Keyword: similarity based clustering

Search Result 323, Processing Time 0.024 seconds

Spectral clustering based on the local similarity measure of shared neighbors

  • Cao, Zongqi;Chen, Hongjia;Wang, Xiang
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.769-779
    • /
    • 2022
  • Spectral clustering has become a typical and efficient clustering method used in a variety of applications. The critical step of spectral clustering is the similarity measurement, which largely determines the performance of the spectral clustering method. In this paper, we propose a novel spectral clustering algorithm based on the local similarity measure of shared neighbors. This similarity measurement exploits the local density information between data points based on the weight of the shared neighbors in a directed k-nearest neighbor graph with only one parameter k, that is, the number of nearest neighbors. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed algorithm outperforms other existing spectral clustering algorithms in terms of the clustering performance measured via the normalized mutual information, clustering accuracy, and F-measure. As an example, the proposed method can provide an improvement of 15.82% in the clustering performance for the Soybean dataset.

Transactions Clustering based on Item Similarity (아이템의 유사도를 고려한 트랜잭션 클러스터링)

  • 이상욱;김재련
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2002.11a
    • /
    • pp.250-257
    • /
    • 2002
  • Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. In traditional data clustering, similarity of a cluster of object is measured by pairwise similarity of objects in that paper. In view of the nature of clustering transactions, we devise in this paper a novel measurement called item similarity and utilize this to perform clustering. With this item similarity measurement, we develop an efficient clustering algorithm for target marketing in each group.

  • PDF

Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors

  • Ye, Xiucai;Sakurai, Tetsuya
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.540-550
    • /
    • 2016
  • Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k-nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real-world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.

A Max-Flow-Based Similarity Measure for Spectral Clustering

  • Cao, Jiangzhong;Chen, Pei;Zheng, Yun;Dai, Qingyun
    • ETRI Journal
    • /
    • v.35 no.2
    • /
    • pp.311-320
    • /
    • 2013
  • In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.

Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure (시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링)

  • 오승준;김재련
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2004.11a
    • /
    • pp.221-229
    • /
    • 2004
  • Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a method for clustering such sequence datasets. The similarity between sequences must be decided before clustering the sequences. This study proposes a new similarity measure to compute the similarity between two sequences using a sequence element. Two clustering algorithms using the proposed similarity measure are proposed: a hierarchical clustering algorithm and a scalable clustering algorithm that uses sampling and a k-nearest neighbor method. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed clustering algorithms is better than that of clusters produced by traditional clustering algorithms.

  • PDF

Development of Similarity-Based Document Clustering System (유사성 계수에 의한 문서 클러스터링 시스템 개발)

  • Woo Hoon-Shik;Yim Dong-Soon
    • Proceedings of the Society of Korea Industrial and System Engineering Conference
    • /
    • 2002.05a
    • /
    • pp.119-124
    • /
    • 2002
  • Clustering of data is of a great interest in many data mining applications. In the field of document clustering, a document is represented as a data in a high dimensional space. Therefore, the document clustering can be accomplished with a general data clustering techniques. In this paper, we introduce a document clustering system based on similarity among documents. The developed system consists of three functions: 1) gatherings documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents with similarity coefficients. Especially, the document clustering is accomplished by a hybrid algorithm utilizing genetic and K-Means methods.

  • PDF

Shot Group and Representative Shot Frame Detection using Similarity-based Clustering

  • Lee, Gye-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.9
    • /
    • pp.37-43
    • /
    • 2016
  • This paper introduces a method for video shot group detection needed for efficient management and summary of video. The proposed method detects shots based on low-level visual properties and performs temporal and spatial clustering based on visual similarity of neighboring shots. Shot groups created from temporal clustering are further clustered into small groups with respect to visual similarity. A set of representative shot frames are selected from each cluster of the smaller groups representing a scene. Shots excluded from temporal clustering are also clustered into groups from which representative shot frames are selected. A number of video clips are collected and applied to the method for accuracy of shot group detection. We achieved 91% of accuracy of the method for shot group detection. The number of representative shot frames is reduced to 1/3 of the total shot frames. The experiment also shows the inverse relationship between accuracy and compression rate.

Transactions Clustering based on Item Similarity (항목 유사도를 고려한 트랜잭션 클러스터링)

  • 이상욱;김재련
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.179-193
    • /
    • 2003
  • Clustering is a data mining method which help discovering interesting data groups in large databases. In traditional data clustering, similarity between objects in the cluster is measured by pairwise similarity of objects. But we devise an advanced measurement called item similarity in this paper, in terms of nature of clustering transaction data and use this measurement to perform clustering. This new algorithm show the similarity by accepting the concept of relationship between different attributes. With this item similarity measurement, we develop an efficient clustering algorithm for target marketing in each group.

  • PDF

A Density Peak Clustering Algorithm Based on Information Bottleneck

  • Yongli Liu;Congcong Zhao;Hao Chao
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.778-790
    • /
    • 2023
  • Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.

Regional Grouping of the interconnected network system through Sequential Clustering (순차적 클러스터링을 이용한 지역별 그룹핑)

  • Kim, Hyun-Hong;Song, Hyoung-Yong;Kim, Jin-Ho;Park, Jong-Bae;Shin, Jung-Rin
    • Proceedings of the KIEE Conference
    • /
    • 2007.11b
    • /
    • pp.252-254
    • /
    • 2007
  • This paper introduces the method of sequential clustering as a tool for the effective clustering of mass unit electrical systems. The interconnected network system retains information about the location of each line. With this information, this paper aims to carry out initial clustering through the transmission usage rate, compare the results of similarity measures for regional information with similarity measures for regional price, and introduce the technicalities of the clustering method. This transmission usage rate used power flow based on congestion costs and modified similarity measurements using the FCM algorithm. This paper also aims to prove the propriety of the proposed clustering method by comparing it with existing clustering methods that use the similarity measurement system. The proposed algorithm is demonstrated through the IEEE 39-bus RTS.

  • PDF