• 제목/요약/키워드: and clustering

검색결과 5,592건 처리시간 0.035초

A Density Peak Clustering Algorithm Based on Information Bottleneck

  • Yongli Liu;Congcong Zhao;Hao Chao
    • Journal of Information Processing Systems
    • /
    • 제19권6호
    • /
    • pp.778-790
    • /
    • 2023
  • Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.

Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim, Sung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • 제22권1호
    • /
    • pp.55-67
    • /
    • 2015
  • An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.

거리-도플러 클러스터링 방법을 사용한 인접한 표적들의 분리 (Separation of Adjacent Targets using Range-Doppler Clustering Method)

  • 공영주;우선걸;박성호;유성현;강연덕
    • 한국인터넷방송통신학회논문지
    • /
    • 제20권2호
    • /
    • pp.67-73
    • /
    • 2020
  • 클러스터링 알고리즘은 유사한 특성을 가진 데이터들을 같은 집단으로 분류하는 방법이다. 레이다 시스템에서는 CFAR 알고리즘 수행한 결과에 대하여 인접한 hit들을 하나로 묶는 방법으로 주로 사용된다. 그러나 인접한 표적의 경우에는 일반적인 클러스터링 방안으로 수행하면 하나의 표적으로 탐지될 경우가 많다. 본 논문에서는 인접한 표적을 분리하기 위한 이중 클러스터링 방안에 대하여 서술한다. 연산시간 단축을 위하여 거리방향으로 클러스터링 수행 후 거리방향 클러스터링 결과를 이용하여 도플러 방향으로 클러스터링을 수행한다. 거리-도플러 방향으로 각각 클러스터링을 수행하기에 표적의 수가 증가하더라도 연산시간의 변화는 극히 적다.

FCM 클러스터링 알고리즘과 퍼지 결정트리를 이용한 상황인식 정보 서비스 (A Context-Aware Information Service using FCM Clustering Algorithm and Fuzzy Decision Tree)

  • 양석환;정목동
    • 한국멀티미디어학회논문지
    • /
    • 제16권7호
    • /
    • pp.810-819
    • /
    • 2013
  • FCM 클러스터링 알고리즘은 대표적인 분할기반 군집화 알고리즘이며 다양한 분야에서 성공적으로 적용되어 왔다. 그러나 FCM 클러스터링 알고리즘은 잡음 및 지역 데이터에 대한 높은 민감도, 직관적인 결과와 상이한 결과 도출 가능성이 높은 문제, 초기 원형과 클러스터 개수 설정 문제 등이 존재한다. 본 논문에서는 FCM 알고리즘의 결과를 해당 속성의 데이터 축에 사상하여 퍼지구간을 결정하고, 결정된 퍼지구간을 FDT에 적용함으로써 FCM 알고리즘이 가지는 문제 중 잡음 및 데이터에 대한 높은 민감도, 직관적인 결과와 상이한 결과 도출 가능성이 높은 문제를 개선하는 시스템을 제안한다. 또한 실제 교통데이터와 강수량 데이터를 이용한 실험을 통하여 제안 모델과 FCM 클러스터링 알고리즘을 비교한다. 실험 결과를 통해 제안 모델은 잡음 및 데이터에 대한 민감도를 완화시킴으로써 보다 안정적인 결과를 제공하며, FCM 클러스터링 알고리즘을 적용한 시스템보다 직관적인 결과와의 일치율을 높여줌을 알 수 있다.

로젯 탐색기의 적외선 주사 영상을 위한 새로운 클러스터링 알고리즘 (A new Clustering Algorithm for the Scanned Infrared Image of the Rosette Seeker)

  • 장성갑;홍현기;두경수;오정수;최종수;서동선
    • 대한전자공학회논문지SP
    • /
    • 제37권2호
    • /
    • pp.1-14
    • /
    • 2000
  • 로젯 주사 탐색기는 적외선 유도 미사일에 장착되어 표적을 추적하는 장치이다. 단소자 검출기가 로젯 패턴의 형태로 공간을 주사함으로써 표적의 2차원 영상을 획득할 수 있다. 검출된 영상은 시계내의 위치에 따라서 형태가 변하고 대상 물체의 수가 고정되어 있지 않기 때문에 unsupervised clustering 방법을 이용하여 이들을 구분한다. 기존의 ISODATA 방식은 씨앗점(seed point)과 대상 화소간의 거리를 이용하여 clustering하기 때문에 물체의 모양이 복잡하거나 병합 및 분리 파라미터 값이 변하면 clustering 결과가 실제와 다르게 나타난다. 본 논문에서는 이러한 단점을 개선한 새로운 clustering 방법인 ALCA (Arrav Linkage Clustering Algorithm)을 제안한다. 이 방식은 화소가 저장된 메모리 번호의 연속성을 이용하여 clustering하기 때문에 초기 씨앗점과 병합 및 분리 파라미터를 필요로 하지 않는다. 따라서 대상 물체의 모양과 관계없이 clustering을 할 수 있다. 대상 물체의 clustering를 기존 방식과 비교 평가함으로써 제안된 방식의 우수성을 확인한다. 또한 제안된 ALCA을 로젯 주사 탐색기의 반대응 능력으로 이용하여 3차원 시뮬레이터상에서 추적 실험을 행한다. 기존 방식과 비교 평가를 통하여 제안된 ALCA 방식이 로젯 주사 탐색기의 반대응 능력으로서 우수한 성능을 가지고 있음을 확인한다.

  • PDF

A Clustering Tool Using Particle Swarm Optimization for DNA Chip Data

  • Han, Xiaoyue;Lee, Min-Soo
    • Genomics & Informatics
    • /
    • 제9권2호
    • /
    • pp.89-91
    • /
    • 2011
  • DNA chips are becoming increasingly popular as a convenient way to perform vast amounts of experiments related to genes on a single chip. And the importance of analyzing the data that is provided by such DNA chips is becoming significant. A very important analysis on DNA chip data would be clustering genes to identify gene groups which have similar properties such as cancer. Clustering data for DNA chips usually deal with a large search space and has a very fuzzy characteristic. The Particle Swarm Optimization algorithm which was recently proposed is a very good candidate to solve such problems. In this paper, we propose a clustering mechanism that is based on the Particle Swarm Optimization algorithm. Our experiments show that the PSO-based clustering algorithm developed is efficient in terms of execution time for clustering DNA chip data, and thus be used to extract valuable information such as cancer related genes from DNA chip data with high cluster accuracy and in a timely manner.

Descriptive and Systematic Comparison of Clustering Methods in Microarray Data Analysis

  • Kim, Seo-Young
    • 응용통계연구
    • /
    • 제22권1호
    • /
    • pp.89-106
    • /
    • 2009
  • There have been many new advances in the development of improved clustering methods for microarray data analysis, but traditional clustering methods are still often used in genomic data analysis, which maY be more due to their conceptual simplicity and their broad usability in commercial software packages than to their intrinsic merits. Thus, it is crucial to assess the performance of each existing method through a comprehensive comparative analysis so as to provide informed guidelines on choosing clustering methods. In this study, we investigated existing clustering methods applied to microarray data in various real scenarios. To this end, we focused on how the various methods differ, and why a particular method does not perform well. We applied both internal and external validation methods to the following eight clustering methods using various simulated data sets and real microarray data sets.

Nonparametric analysis of income distributions among different regions based on energy distance with applications to China Health and Nutrition Survey data

  • Ma, Zhihua;Xue, Yishu;Hu, Guanyu
    • Communications for Statistical Applications and Methods
    • /
    • 제26권1호
    • /
    • pp.57-67
    • /
    • 2019
  • Income distribution is a major concern in economic theory. In regional economics, it is often of interest to compare income distributions in different regions. Traditional methods often compare the income inequality of different regions by assuming parametric forms of the income distributions, or using summary statistics like the Gini coefficient. In this paper, we propose a nonparametric procedure to test for heterogeneity in income distributions among different regions, and a K-means clustering procedure for clustering income distributions based on energy distance. In simulation studies, it is shown that the energy distance based method has competitive results with other common methods in hypothesis testing, and the energy distance based clustering method performs well in the clustering problem. The proposed approaches are applied in analyzing data from China Health and Nutrition Survey 2011. The results indicate that there are significant differences among income distributions of the 12 provinces in the dataset. After applying a 4-means clustering algorithm, we obtained the clustering results of the income distributions in the 12 provinces.

Comprehensive review on Clustering Techniques and its application on High Dimensional Data

  • Alam, Afroj;Muqeem, Mohd;Ahmad, Sultan
    • International Journal of Computer Science & Network Security
    • /
    • 제21권6호
    • /
    • pp.237-244
    • /
    • 2021
  • Clustering is a most powerful un-supervised machine learning techniques for division of instances into homogenous group, which is called cluster. This Clustering is mainly used for generating a good quality of cluster through which we can discover hidden patterns and knowledge from the large datasets. It has huge application in different field like in medicine field, healthcare, gene-expression, image processing, agriculture, fraud detection, profitability analysis etc. The goal of this paper is to explore both hierarchical as well as partitioning clustering and understanding their problem with various approaches for their solution. Among different clustering K-means is better than other clustering due to its linear time complexity. Further this paper also focused on data mining that dealing with high-dimensional datasets with their problems and their existing approaches for their relevancy

K-means Clustering using Grid-based Representatives

  • Park, Hee-Chang;Lee, Sun-Myung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.759-768
    • /
    • 2005
  • K-means clustering has been widely used in many applications, such that pattern analysis, data analysis, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters, because it is more primitive and explorative. In this paper we propose a new method of k-means clustering using the grid-based representative value(arithmetic and trimmed mean) for sample. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF