• 제목/요약/키워드: model-based clustering

검색결과 737건 처리시간 0.027초

A Bayesian Model-based Clustering with Dissimilarities

  • Oh, Man-Suk;Raftery, Adrian
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 추계 학술발표회 논문집
    • /
    • pp.9-14
    • /
    • 2003
  • A Bayesian model-based clustering method is proposed for clustering objects on the basis of dissimilarites. This combines two basic ideas. The first is that tile objects have latent positions in a Euclidean space, and that the observed dissimilarities are measurements of the Euclidean distances with error. The second idea is that the latent positions are generated from a mixture of multivariate normal distributions, each one corresponding to a cluster. We estimate the resulting model in a Bayesian way using Markov chain Monte Carlo. The method carries out multidimensional scaling and model-based clustering simultaneously, and yields good object configurations and good clustering results with reasonable measures of clustering uncertainties. In the examples we studied, the clustering results based on low-dimensional configurations were almost as good as those based on high-dimensional ones. Thus tile method can be used as a tool for dimension reduction when clustering high-dimensional objects, which may be useful especially for visual inspection of clusters. We also propose a Bayesian criterion for choosing the dimension of the object configuration and the number of clusters simultaneously. This is easy to compute and works reasonably well in simulations and real examples.

  • PDF

Model-based Clustering of DOA Data Using von Mises Mixture Model for Sound Source Localization

  • Dinh, Quang Nguyen;Lee, Chang-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제13권1호
    • /
    • pp.59-66
    • /
    • 2013
  • In this paper, we propose a probabilistic framework for model-based clustering of direction of arrival (DOA) data to obtain stable sound source localization (SSL) estimates. Model-based clustering has been shown capable of handling highly overlapped and noisy datasets, such as those involved in DOA detection. Although the Gaussian mixture model is commonly used for model-based clustering, we propose use of the von Mises mixture model as more befitting circular DOA data than a Gaussian distribution. The EM framework for the von Mises mixture model in a unit hyper sphere is degenerated for the 2D case and used as such in the proposed method. We also use a histogram of the dataset to initialize the number of clusters and the initial values of parameters, thereby saving calculation time and improving the efficiency. Experiments using simulated and real-world datasets demonstrate the performance of the proposed method.

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

  • Yoon, Yeo-Chan;Lee, Junwoo;Park, So-Young;Lee, Changki
    • ETRI Journal
    • /
    • 제39권4호
    • /
    • pp.443-454
    • /
    • 2017
  • In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

Curve Clustering in Microarray

  • Lee, Kyeong-Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.575-584
    • /
    • 2004
  • We propose a Bayesian model-based approach using a mixture of Dirichlet processes model with discrete wavelet transform, for curve clustering in the microarray data with time-course gene expressions.

  • PDF

자기구성 클러스터링 기반 뉴로-퍼지 모델링 (Neuro-Fuzzy Modeling based on Self-Organizing Clustering)

  • 김승석;유정웅;김용태
    • 한국지능시스템학회논문지
    • /
    • 제15권6호
    • /
    • pp.688-694
    • /
    • 2005
  • 본 논문에서는 클러스터링을 뉴로-퍼지 모델에 직접 적용하여 모델을 최적화하는 방법을 제안하였다. 기존의 오차미분기반 학습을 통한 뉴로-퍼지 모델의 최적화 과정과는 달리 제안된 방법은 클러스터링 학습과 연계하여 모델을 구성하며 자율적으로 클러스터의 수를 추정하며 동시에 최적화를 수행한다. 순차적인 학습 기법에서는 각각의 학습 기법을 따로 적용하여 모델링을 실시하였으나 제안된 기법에서는 하나의 클러스터링 학습으로 전체 모델의 학습을 실시하였다. 또한 제안된 방법에서는 클러스터링이 수렴하는 만큼 전체 모델의 연산량이 감소하여 학습과정에서 발생하는 연산량 문제를 개선하였다. 시뮬레이션을 통하여 기존의 연구 결과들과 비교하여 제안된 기법의 유용성을 보였다.

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • 제43권1호
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

시계열데이터의 모델기반 클러스터 결정 (Determining on Model-based Clusters of Time Series Data)

  • 전진호;이계성
    • 한국콘텐츠학회논문지
    • /
    • 제7권6호
    • /
    • pp.22-30
    • /
    • 2007
  • 대부분의 실세계의 시스템들, 즉 경제, 주식시장, 의료분야 등의 많은 시스템들은 동적이며 복잡한 현상을 갖는다. 이러한 특징들의 시스템을 이해하는 전형적인 방법은 시스템행위에 대한 모델을 세우고 분석하는 것이다. 본 연구에서는 실세계의 동적 시스템에서 발생되는 시계열데이터들에 대하여 최적의 클러스터를 형성하기 위한 방법을 연구한다. 먼저 클러스터 수를 결정하는 기준으로 베이지안정보기준(BIC : Bayesian Information Criterion)근사법의 활용도를 검증하고 데이터 크기와 베이지안정보기준값의 상관관계를 파악함으로 탐색 효율을 높이는 방안을 제안하며 클러스터링 과정으로 모델기반과 유사기반의 방법론을 비교 확인하여 본다. 실제의 시계열데이터(주가)에 대해 실험을 시행하였고 베이지안정보기준 근사 측도는 데이터의 크기에 따라 파티션의 사이즈를 정확히 추정하는 것을 확인하였으며 또한 유사기반의 방식보다 모델기반의 방법론이 클러스터링에서 더 나은 결과를 갖는 것을 확인하였다.

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • 제27권6호
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering

  • Chang, Jeong-Ho;Chi, Sung Wook;Zhang, Byoung Tak
    • Genomics & Informatics
    • /
    • 제1권1호
    • /
    • pp.32-39
    • /
    • 2003
  • We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycle­regulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • 제6권4호
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.