• 제목/요약/키워드: K-means cluster

검색결과 615건 처리시간 0.023초

예측 데이터를 이용한 빠른 K-Means 알고리즘 (Fast K-Means Clustering Algorithm using Prediction Data)

  • 지태창;이현진;이일병
    • 한국콘텐츠학회논문지
    • /
    • 제9권1호
    • /
    • pp.106-114
    • /
    • 2009
  • 본 논문에서 K-Means 군집화 알고리즘을 빠르게 적용하는 방법을 제안했다. 제안하는 알고리즘의 특징은 속도 향상을 위해 변화될 가능성이 있는 데이터를 예측하는 것이다. 군집화 알고리즘의 각 단계에서 군집이 변경될 가능성이 있는 데이터만 선택하여 군집 중심과의 거리를 계산함으로써 전체 군집 계산 시간을 줄일 수 있었다. 군집이 변화될 예측 데이터를 계산할 때는 K-Means 알고리즘을 적용하면서 생성되는 거리 정보를 사용함으로써 추가되는 계산 시간이 적고, 특히, 거리 정보를 이용하기 때문에 차원의 개수에는 영향을 덜 받는 알고리즘을 제안할 수 있었다. 제안하는 알고리즘의 성능 비교를 위해서 원래의 K-Means인 Lloyd's와 이를 개선한 KMHybrid와 비교했다. 제안하는 알고리즘은 대용량 데이터( 입력 데이터의 크기가 크고, 데이터의 차원이 크며, 군집의 개수가 많은 경우)의 경우에 Lloyd's와 KMHybrid보다 높은 속도 향상을 보였다.

고객의 잠재가치에 기반한 증권사 수수료 정책 연구 (Analysis of Brokerage Commission Policy based on the Potential Customer Value)

  • 신형원;손소영
    • 산업공학
    • /
    • 제16권spc호
    • /
    • pp.123-126
    • /
    • 2003
  • In this paper, we use three cluster algorithms (K-means, Self-Organizing Map, and Fuzzy K-means) to find proper graded stock market brokerage commission rates based on the cumulative transactions on both stock exchange market and HTS (Home Trading System). Stock trading investors for both modes are classified in terms of the total transaction as well as the corresponding mode of investment, respectively. Empirical analysis results indicated that fuzzy K-means cluster analysis is the best fit for the segmentation of customers of both transaction modes in terms of robustness. We then propose the rules for three grouping of customers based on decision tree and apply different brokerage commission to be 0.4%, 0.45%, and 0.5% for exchange market while 0.06%, 0.1%, 0.18% for HTS.

An Improved Hybrid Canopy-Fuzzy C-Means Clustering Algorithm Based on MapReduce Model

  • Dai, Wei;Yu, Changjun;Jiang, Zilong
    • Journal of Computing Science and Engineering
    • /
    • 제10권1호
    • /
    • pp.1-8
    • /
    • 2016
  • The fuzzy c-means (FCM) is a frequently utilized algorithm at present. Yet, the clustering quality and convergence rate of FCM are determined by the initial cluster centers, and so an improved FCM algorithm based on canopy cluster concept to quickly analyze the dataset has been proposed. Taking advantage of the canopy algorithm for its rapid acquisition of cluster centers, this algorithm regards the cluster results of canopy as the input. In this way, the convergence rate of the FCM algorithm is accelerated. Meanwhile, the MapReduce scheme of the proposed FCM algorithm is designed in a cloud environment. Experimental results demonstrate the hybrid canopy-FCM clustering algorithm processed by MapReduce be endowed with better clustering quality and higher operation speed.

이중 K-평균 군집화 (Double K-Means Clustering)

  • 허명회
    • 응용통계연구
    • /
    • 제13권2호
    • /
    • pp.343-352
    • /
    • 2000
  • K-평균 군집화(K-means clustering)는 비계층적 군집화 방법이 하나로서 큰 자료에서 개체 군집화에 효율적인 것으로 알려져 있다. 그러나 종종 비교적 균일한 대군집의 일부를 소군집에 떼어주는 오류를 범하기도 한다. 이 연구에서는 그러한 현상을 정확히 인지하고 이에 대한 대책으로서 ‘이중 K-평균 군집화(double K-means clustering)’방법을 제시한다. 또한 실증적 사례에 새 방법론을 적용해보고 토의한다.

  • PDF

Assessment of Premature Ventricular Contraction Arrhythmia by K-means Clustering Algorithm

  • Kim, Kyeong-Seop
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권5호
    • /
    • pp.65-72
    • /
    • 2017
  • Premature Ventricular Contraction(PVC) arrhythmia is most common abnormal-heart rhythm that may increase mortal risk of a cardiac patient. Thus, it is very important issue to identify the specular portraits of PVC pattern especially from the patient. In this paper, we propose a new method to extract the characteristics of PVC pattern by applying K-means machine learning algorithm on Heart Rate Variability depicted in Poinecare plot. For the quantitative analysis to distinguish the trend of cluster patterns between normal sinus rhythm and PVC beat, the Euclidean distance measure was sought between the clusters. Experimental simulations on MIT-BIH arrhythmia database draw the fact that the distance measure on the cluster is valid for differentiating the pattern-traits of PVC beats. Therefore, we proposed a method that can offer the simple remedy to identify the attributes of PVC beats in terms of K-means clusters especially in the long-period Electrocardiogram(ECG).

제2종 퍼지 집합을 이용한 퍼지 C-means (A Type 2 Fuzzy C-means)

  • 황철;이정훈
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2001년도 춘계학술대회 학술발표 논문집
    • /
    • pp.16-19
    • /
    • 2001
  • This paper presents a type-2 fuzzy C-means (FCM) algorithm that is an extension of the conventional fuzzy C-means algorithm. In our proposed method, the membership values for each pattern are extended as type-2 fuzzy memberships by assigning membership grades to the type-1 memberships. In doing so, cluster centers that are estimated by type-2 memberships may converge to a more desirable location than cluster centers obtained by a type-1 FCM method in the presence of noise.

  • PDF

Environmental Survey Data Modeling Using K-means Clustering Techniques

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권3호
    • /
    • pp.557-566
    • /
    • 2005
  • Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering Is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.

  • PDF

Environmental Survey Data Modeling using K-means Clustering Techniques

  • 박희창;조광현
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2004년도 추계학술대회
    • /
    • pp.77-86
    • /
    • 2004
  • Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.

  • PDF

초발 정신병 환자에서 기저핵 구조물 부피의 패턴분석 (Pattern Analysis of Volume of Basal Ganglia Structures in Patients with First-Episode Psychosis)

  • 민세리;이태영;곽유빈;권준수
    • 생물정신의학
    • /
    • 제25권2호
    • /
    • pp.38-43
    • /
    • 2018
  • Objectives Dopamine dysregulation has been regarded as one of the core pathologies in patients with schizophrenia. Since dopamine synthesis capacity has found to be inconsistent in patients with schizophrenia, current classification of patients based on clinical symptoms cannot reflect the neurochemical heterogeneity of the disease. Here we performed new subtyping of patients with first-episode psychosis (FEP) through biotype-based cluster analysis. We specifically suggested basal ganglia structural changes as a biotype, which deeply involves in the dopaminergic circuit. Methods Forty FEP and 40 demographically matched healthy participants underwent 3T T1 MRI. Whole brain parcellation was conducted, and volumes of total 6 regions of basal ganglia have been extracted as features for cluster analysis. We used K-means clustering, and external validation was conducted with Positive and Negative Syndrome Scale (PANSS). Results K-means clustering divided 40 FEP subjects into 2 clusters. Cluster 1 (n = 25) showed substantial volume decrease in 4 regions of basal ganglia compared to Cluster 2 (n = 15). Cluster 1 showed higher positive scales of PANSS compared with Cluster 2 (F = 2.333, p = 0.025). Compared to healthy controls, Cluster 1 showed smaller volumes in 4 regions, whereas Cluster 2 showed larger volumes in 3 regions. Conclusions Two subgroups have been found by cluster analysis, which showed a distinct difference in volume patterns of basal ganglia structures and positive symptom severity. The result possibly reflects the neurobiological heterogeneity of schizophrenia. Thus, the current study supports the importance of paradigm shift toward biotype-based diagnosis, instead of phenotype, for future precision psychiatry.

  • PDF

대학 강의평가에서 문항 추출에 관한 연구 (A Study on Effective Selection of University Lecture Evaluation)

  • 황세명;김인택
    • 공학교육연구
    • /
    • 제8권1호
    • /
    • pp.31-45
    • /
    • 2005
  • 본 논문에서는, 강의 평가에 필요한 설문을 효과적이며 체계적으로 얻기 위한, 대표 문항 추출 방법을 비교하였다. 비교에 사용한 방법은 요인분석(Factor Analysis: FA), FCM(Fuzzy c-Means) 알고리즘과 군집분석(Cluster Analysis : CA) 등으로 이러한 방법들을 사용하여 고려할 수 있는 다양한 형태의 많은 문항들로부터 적은 수의 문항을 추출한다. 추출된 문항은 많은 수의 문항들이 형성하는 클러스터의 대표 문항을 이루고 있다. 이를 위해 여러 개의 설문지로부터 얻은 120 문항의 강의 평가서를 명지대학교 외 3 개 대학교 646명의 학생들에게 평가를 실시하여 데이터를 얻었는데 학생들은 주어진 문항에 대하여 "매우 그렇다", "그렇다", "보통이다", "그렇지 않다", "매우 그렇지 않다", 그리고 "해당 없다"까지의 6등급으로 응답하였다. 각 문항에 대한 학생들의 응답 성향을 분석하여 약 25문항을 추출하였다. 실험 결과 본 논문에서 비교 분석한 요인분석, FCM알고리즘과 군집분석 등의 기법은 매우 유사한 설문을 추출할 수 있었다.