• Title, Summary, Keyword: Clustering

Search Result 5,176, Processing Time 0.047 seconds

VS-FCM: Validity-guided Spatial Fuzzy c-Means Clustering for Image Segmentation

  • Kang, Bo-Yeong;Kim, Dae-Won
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.89-93
    • /
    • 2010
  • In this paper a new fuzzy clustering approach to the color clustering problem has been proposed. To deal with the limitations of the traditional FCM algorithm, we propose a spatial homogeneity-based FCM algorithm. Moreover, the cluster validity index is employed to automatically determine the number of clusters for a given image. We refer to this method as VS-FCM algorithm. The effectiveness of the proposed method is demonstrated through various clustering examples.

On the clustering of huge categorical data

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1353-1359
    • /
    • 2010
  • Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.

More Efficient k-Modes Clustering Algorithm

  • Kim, Dae-Won;Chae, Yi-Geun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.549-556
    • /
    • 2005
  • A hard-type centroids in the conventional clustering algorithm such as k-modes algorithm cannot keep the uncertainty inherently in data sets as long as possible before actual clustering(decision) are made. Therefore, we propose the k-populations algorithm to extend clustering ability and to heed the data characteristics. This k-population algorithm as found to give markedly better clustering results through various experiments.

  • PDF

A Study on K -Means Clustering

  • Bae, Wha-Soo;Roh, Se-Won
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.497-508
    • /
    • 2005
  • This paper aims at studying on K-means Clustering focusing on initialization which affect the clustering results in K-means cluster analysis. The four different methods(the MA method, the KA method, the Max-Min method and the Space Partition method) were compared and the clustering result shows that there were some differences among these methods, especially that the MA method sometimes leads to incorrect clustering due to the inappropriate initialization depending on the types of data and the Max-Min method is shown to be more effective than other methods especially when the data size is large.

Detected Point Clustering Algorithm For Automatic Visual Inspection (자동외관검사를 위한 검출위치 클러스터링 알고리즘)

  • Ryu, Sun Joong
    • Journal of the Semiconductor & Display Technology
    • /
    • v.13 no.3
    • /
    • pp.1-6
    • /
    • 2014
  • Visual defect inspection for electronics parts manufacturing processes is comprised of 2 steps - automatic visual inspection by machine and inspection by human inspectors. It is necessary that spatial points which were detected by the machine should be adequately clustered for subsequent human inspection. This research deals with the spatial clustering algorithm for the purpose of process productivity improvement. Distribution based clustering is newly developed and experimentally confirmed to show better clustering efficiency than existing algorithm - area based clustering.

Double monothetic clustering for histogram-valued data

  • Kim, Jaejik;Billard, L.
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.263-274
    • /
    • 2018
  • One of the common issues in large dataset analyses is to detect and construct homogeneous groups of objects in those datasets. This is typically done by some form of clustering technique. In this study, we present a divisive hierarchical clustering method for two monothetic characteristics of histogram data. Unlike classical data points, a histogram has internal variation of itself as well as location information. However, to find the optimal bipartition, existing divisive monothetic clustering methods for histogram data consider only location information as a monothetic characteristic and they cannot distinguish histograms with the same location but different internal variations. Thus, a divisive clustering method considering both location and internal variation of histograms is proposed in this study. The method has an advantage in interpreting clustering outcomes by providing binary questions for each split. The proposed clustering method is verified through a simulation study and applied to a large U.S. house property value dataset.

Development of Similarity-Based Document Clustering System (유사성 계수에 의한 문서 클러스터링 시스템 개발)

  • Woo Hoon-Shik;Yim Dong-Soon
    • Proceedings of the Society of Korea Industrial and System Engineering Conference
    • /
    • /
    • pp.119-124
    • /
    • 2002
  • Clustering of data is of a great interest in many data mining applications. In the field of document clustering, a document is represented as a data in a high dimensional space. Therefore, the document clustering can be accomplished with a general data clustering techniques. In this paper, we introduce a document clustering system based on similarity among documents. The developed system consists of three functions: 1) gatherings documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents with similarity coefficients. Especially, the document clustering is accomplished by a hybrid algorithm utilizing genetic and K-Means methods.

  • PDF

A Bayesian Model-based Clustering with Dissimilarities

  • Oh, Man-Suk;Raftery, Adrian
    • Proceedings of the Korean Statistical Society Conference
    • /
    • /
    • pp.9-14
    • /
    • 2003
  • A Bayesian model-based clustering method is proposed for clustering objects on the basis of dissimilarites. This combines two basic ideas. The first is that tile objects have latent positions in a Euclidean space, and that the observed dissimilarities are measurements of the Euclidean distances with error. The second idea is that the latent positions are generated from a mixture of multivariate normal distributions, each one corresponding to a cluster. We estimate the resulting model in a Bayesian way using Markov chain Monte Carlo. The method carries out multidimensional scaling and model-based clustering simultaneously, and yields good object configurations and good clustering results with reasonable measures of clustering uncertainties. In the examples we studied, the clustering results based on low-dimensional configurations were almost as good as those based on high-dimensional ones. Thus tile method can be used as a tool for dimension reduction when clustering high-dimensional objects, which may be useful especially for visual inspection of clusters. We also propose a Bayesian criterion for choosing the dimension of the object configuration and the number of clusters simultaneously. This is easy to compute and works reasonably well in simulations and real examples.

  • PDF

Development of the Combinatorial Agglomerative Hierarchical Clustering Method Using the Measure of Cohesion (응집력 척도를 활용한 계층별-조결합군락화 기법의 개발)

  • Jeong, Hyeon-Tae;Choe, In-Su
    • Journal of the Korean Society for Quality Management
    • /
    • v.18 no.1
    • /
    • pp.48-54
    • /
    • 1990
  • The purpose of this study is to design effective working systems which adapt to change in human needs by developing an method which forms into optimal groups using the measure of cohesion. Two main results can be derived from the study as follows : First, the clustering method based on the entropic measure of cohesion is predominant with respect to any other methods proposed in designing the work groups, since this clustering criterion includes symmetrical relations of total work groups and the dissimilarity as well as the similarity relations of predicate value, the clustering method based on this criterion is suitable for designing the new work structure. Second, total work group is clustered as the workers who have the equal predicate value and then clustering results are produced through the combinatorial agglomerative hierarchical clustering method. This clustering method present more economic results than the method that clustering the total work group do.

  • PDF

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.1-11
    • /
    • 2006
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF