• 제목/요약/키워드: Data clustering

검색결과 2,765건 처리시간 0.041초

A Clustering Tool Using Particle Swarm Optimization for DNA Chip Data

  • Han, Xiaoyue;Lee, Min-Soo
    • Genomics & Informatics
    • /
    • 제9권2호
    • /
    • pp.89-91
    • /
    • 2011
  • DNA chips are becoming increasingly popular as a convenient way to perform vast amounts of experiments related to genes on a single chip. And the importance of analyzing the data that is provided by such DNA chips is becoming significant. A very important analysis on DNA chip data would be clustering genes to identify gene groups which have similar properties such as cancer. Clustering data for DNA chips usually deal with a large search space and has a very fuzzy characteristic. The Particle Swarm Optimization algorithm which was recently proposed is a very good candidate to solve such problems. In this paper, we propose a clustering mechanism that is based on the Particle Swarm Optimization algorithm. Our experiments show that the PSO-based clustering algorithm developed is efficient in terms of execution time for clustering DNA chip data, and thus be used to extract valuable information such as cancer related genes from DNA chip data with high cluster accuracy and in a timely manner.

Design and Comparison of Error Correctors Using Clustering in Holographic Data Storage System

  • Kim, Sang-Hoon;Kim, Jang-Hyun;Yang, Hyun-Seok;Park, Young-Pil
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.1076-1079
    • /
    • 2005
  • Data storage related with writing and retrieving requires high storage capacity, fast transfer rate and less access time in. Today any data storage system can not satisfy these conditions, but holographic data storage system can perform faster data transfer rate because it is a page oriented memory system using volume hologram in writing and retrieving data. System architecture without mechanical actuating part is possible, so fast data transfer rate and high storage capacity about 1Tb/cm3 can be realized. In this paper, to correct errors of binary data stored in holographic digital data storage system, find cluster centers using clustering algorithm and reduce intensities of pixels around centers. We archive the procedure by two algorithms of C-mean and subtractive clustering, and compare the results of the two algorithms. By using proper clustering algorithm, the intensity profile of data page will be uniform and the better data storage system can be realized.

  • PDF

Design and Comparison of Error Reduction Methods Using Clustering in Holographic Data Storage System (홀로그래픽 정보 저장 장치에서 클러스터링을 이용한 에러 감소 기법 제안 및 비교)

  • Kim Sang-Hoon;Kim Jang-Hyun;Yang Hyun-Seok;Park Young-Pil
    • 정보저장시스템학회:학술대회논문집
    • /
    • 정보저장시스템학회 2005년도 추계학술대회 논문집
    • /
    • pp.83-87
    • /
    • 2005
  • Data storage related with writing and retrieving requires high storage capacity, fast transfer rate and less access time in. Today any data storage system can not satisfy these conditions, but holographic data storage system can perform faster data transfer rate because it is a page oriented memory system using volume hologram in writing and retrieving data. System architecture without mechanical actuating pare is possible, so fast data transfer rate and high storage capacity about 1Tb/cm3 can be realized. In this paper, to correct errors of binary data stored in holographic digital data storage system, find cluster centers using clustering algorithm and reduce intensities of pixels around centers. We archive the procedure by two algorithms of C-mean and subtractive clustering, and compare the results of the two algorithms. By using proper clustering algorithm, the intensity profile of data page will be uniform and the better data storage system can be realized.

  • PDF

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • 제27권6호
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

K-means Clustering using Grid-based Representatives

  • Park, Hee-Chang;Lee, Sun-Myung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.759-768
    • /
    • 2005
  • K-means clustering has been widely used in many applications, such that pattern analysis, data analysis, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters, because it is more primitive and explorative. In this paper we propose a new method of k-means clustering using the grid-based representative value(arithmetic and trimmed mean) for sample. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF

Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining (데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬)

  • 황인수
    • Korean Management Science Review
    • /
    • 제19권1호
    • /
    • pp.189-196
    • /
    • 2002
  • Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

A Stigmergy-and-Neighborhood Based Ant Algorithm for Clustering Data

  • Lee, Hee-Sang;Shim, Gyu-Seok
    • Management Science and Financial Engineering
    • /
    • 제15권1호
    • /
    • pp.81-96
    • /
    • 2009
  • Data mining, specially clustering is one of exciting research areas for ant based algorithms. Ant clustering algorithm, however, has many difficulties for resolving practical situations in clustering. We propose a new grid-based ant colony algorithm for clustering of data. The previous ant based clustering algorithms usually tried to find the clusters during picking up or dropping down process of the items of ants using some stigmergy information. In our ant clustering algorithm we try to make the ants reflect neighborhood information within the storage nests. We use two ant classes, search ants and labor ants. In the initial step of the proposed algorithm, the search ants try to guide the characteristics of the storage nests. Then the labor ants try to classify the items using the guide in-formation that has set by the search ants and the stigmergy information that has set by other labor ants. In this procedure the clustering decision of ants is quickly guided and keeping out of from the stagnated process. We experimented and compared our algorithm with other known algorithms for the known and statistically-made data. From these experiments we prove that the suggested ant mining algorithm found the clusters quickly and effectively comparing with a known ant clustering algorithm.

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

  • Kim, Man-Sun;Lee, Sang-Yong
    • The KIPS Transactions:PartB
    • /
    • 제10B권1호
    • /
    • pp.33-40
    • /
    • 2003
  • Data mining plays an important role in a knowledge discovery process and various algorithms of data mining can be selected for the specific purpose. Most of traditional hierachical clustering methode are suitable for processing small data sets, so they difficulties in handling large data sets because of limited resources and insufficient efficiency. In this study we propose a hybrid neural networks clustering technique, called PPC for Pre-Post Clustering that can be applied to large data sets and find unknown patterns. PPC combinds an artificial intelligence method, SOM and a statistical method, hierarchical clustering technique, and clusters data through two processes. In pre-clustering process, PPC digests large data sets using SOM. Then in post-clustering, PPC measures Similarity values according to cohesive distances which show inner features, and adjacent distances which show external distances between clusters. At last PPC clusters large data sets using the simularity values. Experiment with UCI repository data showed that PPC had better cohensive values than the other clustering techniques.

Clustering Algorithm for Sequences of Categorical Values (범주형 값들이 순서를 가지고 있는 데이터들의 클러스터링 기법)

  • 오승준;김재련
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • 제26권1호
    • /
    • pp.17-21
    • /
    • 2003
  • We study clustering algorithm for sequences of categorical values. Clustering is a data mining problem that has received significant attention by the database community. Traditional clustering algorithms deal with numerical or categorical data points. However, there exist many important databases that store categorical data sequences. In this paper, we introduce new similarity measure and develop a hierarchical clustering algorithm. An experimental section shows performance of the proposed approach.