• Title/Summary/Keyword: Two-step Clustering

Search Result 85, Processing Time 0.025 seconds

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke;Minami, Hiroyuki;Misuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.3
    • /
    • pp.225-234
    • /
    • 2014
  • We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

Two-step Clustering Method Using Time Schema for Performance Improvement in Recommender Systems (추천시스템의 성능 향상을 위한 시간스키마 적용 2단계 클러스터링 기법)

  • Bu Jong-Su;Hong Jong-Kyu;Park Won-Ik;Kim Ryong;Kim Young-Kuk
    • The Journal of Society for e-Business Studies
    • /
    • v.10 no.2
    • /
    • pp.109-132
    • /
    • 2005
  • With the flood of multimedia contents over the digital TV channels, the internet, and etc., users sometimes have a difficulty in finding their preferred contents, spend heavy surfing time to find them, and are even very likely to miss them while searching. In this paper we suggests two-step clustering technique using time schema on how the system can recommend the user's preferred contents based on the collaborative filtering that has been proved to be successful when new users appeared. This method maps and recommends users' profile according to the gender and age at the first step, and then recommends a probabilistic item clustering customers who choose the same item at the same time based on time schema at the second stage. In addition, this has improved the accuracy of predictions in recommendation and the efficiency in time calculation by reflecting feedbacks of the result of the recommender engine and dynamically update customers' preference.

  • PDF

Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders (합성곱 오토인코더 기반의 응집형 계층적 군집 분석)

  • Park, Nojin;Ko, Hanseok
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.

Metastasis Related Gene Exploration Using TwoStep Clustering for Medulloblastoma Microarray Data

  • Ban, Sung-Su;Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.153-159
    • /
    • 2005
  • Microarray gene expression technology has applications that could refine diagnosis and therapeutic monitoring as well as improve disease prevention through risk assessment and early detection. Especially, microarray expression data can provide important information regarding specific genes related with metastasis through an appropriate analysis. Various methods for clustering analysis microarray data have been introduced so far. We used twostep clustering fot ascertain metastasis related gene through t-test. Through t-test between two groups for two publicly available medulloblastoma microarray data sets, we intended to find significant gene for metastasis. The paper describes the process in detail showing how the process is applied to clustering analysis and t-test for microarray datasets and how the metastasis-associated genes are explorated.

  • PDF

Mobile App Analytics using Media Repertoire Approach (미디어 레퍼토리를 이용한 스마트폰 애플리케이션 이용 패턴 유형 분석)

  • Kwon, Sung Eun;Jang, Shu In;Hwangbo, Hyunwoo
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.133-154
    • /
    • 2021
  • Today smart phone is the most common media with a vehicle called 'application'. In order to understand how media users select applications and build their repertoire, this study conducted two-step approach using big data from smart phone log for 4 weeks in November 2019, and finally classified 8 media repertoire groups. Each of the eight media repertoire groups showed differences in time spent of mobile application category compared to other groups, and also showed differences between groups in demographic distribution. In addition to the academic contribution of identifying the mobile application repertoire with large scale behavioral data, this study also has significance in proposing a two-step approach that overcomes 'outlier issue' in behavioral data by extracting prototype vectors using SOM (Sefl-Organized Map) and applying it to k-means clustering for optimization of the classification. The study is also meaningful in that it categorizes customers using e-commerce services, identifies customer structure based on behavioral data, and provides practical guides to e-commerce communities that execute appropriate services or marketing decisions for each customer group.

Unification of neural network with a hierarchical pattern recognition

  • Park, Chang-Mock;Wang, Gi-Nam
    • Proceedings of the ESK Conference
    • /
    • 1996.10a
    • /
    • pp.197-205
    • /
    • 1996
  • Unification of neural network with a hierarchical pattern recognition is presented for recognizing large set of objects. A two-step identification procedure is developed for pattern recognition: coarse and fine identification. The coarse identification is designed for finding a class of object while the fine identification procedure is to identify a specific object. During the training phase a course neural network is trained for clustering larger set of reference objects into a number of groups. For training a fine neural network, expert neural network is also trained to identify a specific object within a group. The presented idea can be interpreted as two step identification. Experimental results are given to verify the proposed methodology.

  • PDF

A Stigmergy-and-Neighborhood Based Ant Algorithm for Clustering Data

  • Lee, Hee-Sang;Shim, Gyu-Seok
    • Management Science and Financial Engineering
    • /
    • v.15 no.1
    • /
    • pp.81-96
    • /
    • 2009
  • Data mining, specially clustering is one of exciting research areas for ant based algorithms. Ant clustering algorithm, however, has many difficulties for resolving practical situations in clustering. We propose a new grid-based ant colony algorithm for clustering of data. The previous ant based clustering algorithms usually tried to find the clusters during picking up or dropping down process of the items of ants using some stigmergy information. In our ant clustering algorithm we try to make the ants reflect neighborhood information within the storage nests. We use two ant classes, search ants and labor ants. In the initial step of the proposed algorithm, the search ants try to guide the characteristics of the storage nests. Then the labor ants try to classify the items using the guide in-formation that has set by the search ants and the stigmergy information that has set by other labor ants. In this procedure the clustering decision of ants is quickly guided and keeping out of from the stagnated process. We experimented and compared our algorithm with other known algorithms for the known and statistically-made data. From these experiments we prove that the suggested ant mining algorithm found the clusters quickly and effectively comparing with a known ant clustering algorithm.

Reconstruction from Feature Points of Face through Fuzzy C-Means Clustering Algorithm with Gabor Wavelets (FCM 군집화 알고리즘에 의한 얼굴의 특징점에서 Gabor 웨이브렛을 이용한 복원)

  • 신영숙;이수용;이일병;정찬섭
    • Korean Journal of Cognitive Science
    • /
    • v.11 no.2
    • /
    • pp.53-58
    • /
    • 2000
  • This paper reconstructs local region of a facial expression image from extracted feature points of facial expression image using FCM(Fuzzy C-Meang) clustering algorithm with Gabor wavelets. The feature extraction in a face is two steps. In the first step, we accomplish the edge extraction of main components of face using average value of 2-D Gabor wavelets coefficient histogram of image and in the next step, extract final feature points from the extracted edge information using FCM clustering algorithm. This study presents that the principal components of facial expression images can be reconstructed with only a few feature points extracted from FCM clustering algorithm. It can also be applied to objects recognition as well as facial expressions recognition.

  • PDF

Genomic Tree of Gene Contents Based on Functional Groups of KEGG Orthology

  • Kim Jin-Sik;Lee Sang-Yup
    • Journal of Microbiology and Biotechnology
    • /
    • v.16 no.5
    • /
    • pp.748-756
    • /
    • 2006
  • We propose a genome-scale clustering approach to identify whole genome relationships using the functional groups given by the Kyoto Encyclopedia of Genes and Genomes Orthology (KO) database. The metabolic capabilities of each organism were defined by the number of genes in each functional category. The archaeal, bacterial, and eukaryotic genomes were compared by simultaneously applying a two-step clustering method, comprised of a self-organizing tree algorithm followed by unsupervised hierarchical clustering. The clustering results were consistent with various phenotypic characteristics of the organisms analyzed and, additionally, showed a different aspect of the relationship between genomes that have previously been established through rRNA-based comparisons. The proposed approach to collect and cluster the metabolic functional capabilities of organisms should make it a useful tool in predicting relationships among organisms.

Development of Mining model through reproducibility assessment in Adverse drug event surveillance system (약물부작용감시시스템에서 재현성 평가를 통한 마이닝 모델 개발)

  • Lee, Young-Ho;Yoon, Young-Mi;Lee, Byung-Mun;Hwang, Hee-Joung;Kang, Un-Gu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.183-192
    • /
    • 2009
  • ADESS(Adverse drug event surveillance system) is the system which distinguishes adverse drug events using adverse drug signals. This system shows superior effectiveness in adverse drug surveillance than current methods such as volunteer reporting or char review. In this study, we built clinical data mart(CDM) for the development of ADESS. This CDM could obtain data reliability by applying data quality management and the most suitable clustering number(n=4) was gained through the reproducibility assessment in unsupervised learning techniques of knowledge discovery. As the result of analysis, by applying the clustering number(N=4) K-means, Kohonen, and two-step clustering models were produced and we confirmed that the K-means algorithm makes the most closest clustering to the result of adverse drug events.