• Title/Summary/Keyword: K means clustering

Search Result 1,111, Processing Time 0.027 seconds

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

  • Yoon, Yeo-Chan;Lee, Junwoo;Park, So-Young;Lee, Changki
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.443-454
    • /
    • 2017
  • In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

Assessment of Premature Ventricular Contraction Arrhythmia by K-means Clustering Algorithm

  • Kim, Kyeong-Seop
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.5
    • /
    • pp.65-72
    • /
    • 2017
  • Premature Ventricular Contraction(PVC) arrhythmia is most common abnormal-heart rhythm that may increase mortal risk of a cardiac patient. Thus, it is very important issue to identify the specular portraits of PVC pattern especially from the patient. In this paper, we propose a new method to extract the characteristics of PVC pattern by applying K-means machine learning algorithm on Heart Rate Variability depicted in Poinecare plot. For the quantitative analysis to distinguish the trend of cluster patterns between normal sinus rhythm and PVC beat, the Euclidean distance measure was sought between the clusters. Experimental simulations on MIT-BIH arrhythmia database draw the fact that the distance measure on the cluster is valid for differentiating the pattern-traits of PVC beats. Therefore, we proposed a method that can offer the simple remedy to identify the attributes of PVC beats in terms of K-means clusters especially in the long-period Electrocardiogram(ECG).

Use of Factor Analyzer Normal Mixture Model with Mean Pattern Modeling on Clustering Genes

  • Kim Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.113-123
    • /
    • 2006
  • Normal mixture model(NMM) frequently used to cluster genes on microarray gene expression data. In this paper some of component means of NMM are modelled by a linear regression model so that its design matrix presents the pattern between sample classes in microarray matrix. This modelling for the component means by given design matrices certainly has an advantage that we can lead the clusters that are previously designed. However, it suffers from 'overfitting' problem because in practice genes often are highly dimensional. This problem also arises when the NMM restricted by the linear model for component-means is fitted. To cope with this problem, in this paper, the use of the factor analyzer NMM restricted by linear model is proposed to cluster genes. Also several design matrices which are useful for clustering genes are provided.

Improved Classification Algorithm using Extended Fuzzy Clustering and Maximum Likelihood Method

  • Jeon Young-Joon;Kim Jin-Il
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.447-450
    • /
    • 2004
  • This paper proposes remotely sensed image classification method by fuzzy c-means clustering algorithm using average intra-cluster distance. The average intra-cluster distance acquires an average of the vector set belong to each cluster and proportionates to its size and density. We perform classification according to pixel's membership grade by cluster center of fuzzy c-means clustering using the mean-values of training data about each class. Fuzzy c-means algorithm considered membership degree for inter-cluster of each class. And then, we validate degree of overlap between clusters. A pixel which has a high degree of overlap applies to the maximum likelihood classification method. Finally, we decide category by comparing with fuzzy membership degree and likelihood rate. The proposed method is applied to IKONOS remote sensing satellite image for the verifying test.

  • PDF

DNA Marker Mining of BMS1167 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won;Kwon, Jae-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.325-333
    • /
    • 2006
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS1167 resulted in three cluster groups. We conclude that the major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 are markers 100bp, 108bp and 110bp.

  • PDF

A Major DNA Marker Mining of BMS941 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.913-921
    • /
    • 2005
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS941 resulted in three cluster groups. We conclude that the major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 are markers 80bp, 85bp 90bp and 105bp.

  • PDF

RBF Equalizer reducing a Center Estimating Speed (센터 추정 속도를 감축한 RBF 등화기)

  • 권용광;김재공
    • Proceedings of the IEEK Conference
    • /
    • 2001.06a
    • /
    • pp.289-292
    • /
    • 2001
  • This paper investigates a RBF equalizer (RBFE) reducing a center Estimating Speed. One of method for RBF center estimation is using k-means clustering. The performance of RBFE is depends on the estimation ability of the RBF center. We Propose a RBF Equalizer using modified k-means clustering algorithm (MKMC) to speed up channel estimation and to reduce complexity of calculation. Computer simulations are included to illustrate the analytical results. It is shown that a discussed method improves about 1 dB via less training data.

  • PDF

새로운 모형기반 군집분석 알고리즘

  • Park, Jeong-Su;Hwang, Hyeon-Sik
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.97-100
    • /
    • 2005
  • A new model-based clustering algorithm is proposed. The idea starts from the assumption that observations are realizations of Gaussian processes and so are correlated. With a special covariance structure, the posterior probability that an observation belongs to each cluster is computed using the ECM algorithm. A preliminary result of small-scale simulation study is given to compare with the k-means clustering algorithms.

  • PDF

A Study on Static Situation Awareness System with the Aid of Optimized Polynomial Radial Basis Function Neural Networks (최적화된 pRBF 뉴럴 네트워크에 의한 정적 상황 인지 시스템에 관한 연구)

  • Oh, Sung-Kwun;Na, Hyun-Suk;Kim, Wook-Dong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.60 no.12
    • /
    • pp.2352-2360
    • /
    • 2011
  • In this paper, we introduce a comprehensive design methodology of Radial Basis Function Neural Networks (RBFNN) that is based on mechanism of clustering and optimization algorithm. We can divide some clusters based on similarity of input dataset by using clustering algorithm. As a result, the number of clusters is equal to the number of nodes in the hidden layer. Moreover, the centers of each cluster are used into the centers of each receptive field in the hidden layer. In this study, we have applied Fuzzy-C Means(FCM) and K-Means(KM) clustering algorithm, respectively and compared between them. The weight connections of model are expanded into the type of polynomial functions such as linear and quadratic. In this reason, the output of model consists of relation between input and output. In order to get the optimal structure and better performance, Particle Swarm Optimization(PSO) is used. We can obtain optimized parameters such as both the number of clusters and the polynomial order of weights connection through structural optimization as well as the widths of receptive fields through parametric optimization. To evaluate the performance of proposed model, NXT equipment offered by National Instrument(NI) is exploited. The situation awareness system-related intelligent model was built up by the experimental dataset of distance information measured between object and diverse sensor such as sound sensor, light sensor, and ultrasonic sensor of NXT equipment.

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.