• Title/Summary/Keyword: Local clustering

Search Result 341, Processing Time 0.028 seconds

Spectral clustering based on the local similarity measure of shared neighbors

  • Cao, Zongqi;Chen, Hongjia;Wang, Xiang
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.769-779
    • /
    • 2022
  • Spectral clustering has become a typical and efficient clustering method used in a variety of applications. The critical step of spectral clustering is the similarity measurement, which largely determines the performance of the spectral clustering method. In this paper, we propose a novel spectral clustering algorithm based on the local similarity measure of shared neighbors. This similarity measurement exploits the local density information between data points based on the weight of the shared neighbors in a directed k-nearest neighbor graph with only one parameter k, that is, the number of nearest neighbors. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed algorithm outperforms other existing spectral clustering algorithms in terms of the clustering performance measured via the normalized mutual information, clustering accuracy, and F-measure. As an example, the proposed method can provide an improvement of 15.82% in the clustering performance for the Soybean dataset.

Local Distribution Based Density Clustering for Speaker Diarization (화자분할을 위한 지역적 특성 기반 밀도 클러스터링)

  • Rho, Jinsang;Shon, Suwon;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.303-309
    • /
    • 2015
  • Speaker diarization is the task of determining the speakers for unlabeled data, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has been widely used in the field of speaker diarization for its simplicity and computational efficiency. One challenging issue, however, is that if different clusters in non-spatial dataset are adjacent to each other, over-clustering may occur which subsequently degrades the performance of DBSCAN. In this paper, we identify the drawbacks of DBSCAN and propose a new density clustering algorithm based on local distribution property around object. Variable density criterions for local density and spreadness of object are used for effective data clustering. We compare the proposed algorithm to DBSCAN in terms of clustering accuracy. Experimental results confirm that the proposed algorithm exhibits higher accuracy than DBSCAN without over-clustering and confirm that the new approach based on local density and object spreadness is efficient.

Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders (합성곱 오토인코더 기반의 응집형 계층적 군집 분석)

  • Park, Nojin;Ko, Hanseok
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.

Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

  • Tian, Hechan;Liu, Fenlin;Luo, Xiangyang;Zhang, Fan;Qiao, Yaqiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3972-3988
    • /
    • 2020
  • Existing methods always rely on statistical features to extract local words for microblog user geolocation. There are many non-local words in extracted words, which makes geolocation accuracy lower. Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection. First, ordinary words without positional indications are initially filtered based on statistical features. Second, a word clustering algorithm based on word vectors is proposed. The remaining semantically similar words are clustered together based on the distance of word vectors with semantic meanings. Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected. Words in selected cluster subset are extracted as local words. Finally, the Naive Bayes classifier is trained based on local words to geolocate the microblog user. The proposed method is validated based on two different types of microblog data - Twitter and Weibo. The results show that the proposed method outperforms existing two typical methods based on statistical features in terms of accuracy, precision, recall, and F1-score.

A New Scheme for Maximizing Network Lifetime in Wireless Sensor Networks (무선 센서네트워크에서 네트워크수명 극대화 방안)

  • Kim, Jeong Sahm
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.10 no.2
    • /
    • pp.47-59
    • /
    • 2014
  • In this paper, I propose a new energy efficient clustering scheme to prolong the network lifetime by reducing energy consumption at the sensor node. It is possible that a node determines whether to participate in clustering with certain probability based on local density. This scheme is useful under the environment that sensor nodes are deployed unevenly within the sensing area. By adjusting the probability of participating in clustering dynamically with local density of nodes, the energy consumption of the network is reduced. So, the lifetime of the network is extended. In the region where nodes are densely deployed, it is possible to reduce the energy consumption of the network by limiting the number of node which is participated in clustering with probability which can be adjusted dynamically based on local density of the node. Through computer simulation, it is verified that the proposed scheme is more energy efficient than LEACH protocol under the environment where node are densely located in a specific area.

Fuzzy k-Means Local Centers of the Social Networks

  • Woo, Won-Seok;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.213-217
    • /
    • 2012
  • Fuzzy k-means clustering is an attractive alternative to the ordinary k-means clustering in analyzing multivariate data. Fuzzy versions yield more natural output by allowing overlapped k groups. In this study, we modify a fuzzy k-means clustering algorithm to be used for undirected social networks, apply the algorithm to both real and simulated cases, and report the results.

Design of Pattern Classification Rule based on Local Linear Discriminant Analysis Classifier by using Differential Evolutionary Algorithm (차분진화 알고리즘을 이용한 지역 Linear Discriminant Analysis Classifier 기반 패턴 분류 규칙 설계)

  • Roh, Seok-Beom;Hwang, Eun-Jin;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.1
    • /
    • pp.81-86
    • /
    • 2012
  • In this paper, we proposed a new design methodology of a pattern classification rule based on the local linear discriminant analysis expanded from the generic linear discriminant analysis which is used in the local area divided from the whole input space. There are two ways such as k-Means clustering method and the differential evolutionary algorithm to partition the whole input space into the several local areas. K-Means clustering method is the one of the unsupervised clustering methods and the differential evolutionary algorithm is the one of the optimization algorithms. In addition, the experimental application covers a comparative analysis including several previously commonly encountered methods.

A Study on Data Clustering Method Using Local Probability (국부 확률을 이용한 데이터 분류에 관한 연구)

  • Son, Chang-Ho;Choi, Won-Ho;Lee, Jae-Kook
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.13 no.1
    • /
    • pp.46-51
    • /
    • 2007
  • In this paper, we propose a new data clustering method using local probability and hypothesis theory. To cluster the test data set we analyze the local area of the test data set using local probability distribution and decide the candidate class of the data set using mean standard deviation and variance etc. To decide each class of the test data, statistical hypothesis theory is applied to the decided candidate class of the test data set. For evaluating, the proposed classification method is compared to the conventional fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm. The simulation results show more accuracy than results of fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm.

Customer Clustering Method Using Repeated Small-sized Clustering to improve the Classifying Ability of Typical Daily Load Profile (일일 대표 부하패턴의 분별력을 높이기 위한 반복적인 소규모 군집화를 이용한 고객 군집화 방법)

  • Kim, Young-Il;Song, Jae-Ju;Oh, Do-Eun;Jung, Nam-Joon;Yang, Il-Kwon
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.11
    • /
    • pp.2269-2274
    • /
    • 2009
  • Customer clustering method is used to make a TDLP (typical daily load profile) to estimate the quater hourly load profile of non-AMR (Automatic Meter Reading) customer. In this paper, repeated small-sized clustering method is supposed to improve the classifying ability of TDLP. K-means algorithm is well-known clustering technology of data mining. To reduce the local maxima of k-means algorithm, proposed method clusters average load profiles to small-sized clusters and selects the highest error rated cluster and clusters this to small-sized clusters repeatedly to minimize the local maxima.

Application of Genetic and Local Optimization Algorithms for Object Clustering Problem with Similarity Coefficients (유사성 계수를 이용한 군집화 문제에서 유전자와 국부 최적화 알고리듬의 적용)

  • Yim, Dong-Soon;Oh, Hyun-Seung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.29 no.1
    • /
    • pp.90-99
    • /
    • 2003
  • Object clustering, which makes classification for a set of objects into a number of groups such that objects included in a group have similar characteristic and objects in different groups have dissimilar characteristic each other, has been exploited in diverse area such as information retrieval, data mining, group technology, etc. In this study, an object-clustering problem with similarity coefficients between objects is considered. At first, an evaluation function for the optimization problem is defined. Then, a genetic algorithm and local optimization technique based on heuristic method are proposed and used in order to obtain near optimal solutions. Solutions from the genetic algorithm are improved by local optimization techniques based on object relocation and cluster merging. Throughout extensive experiments, the validity and effectiveness of the proposed algorithms are tested.