• 제목/요약/키워드: K means clustering

검색결과 1,107건 처리시간 0.029초

A Study on Representative Skyline Using Connected Component Clustering

  • Choi, Jong-Hyeok;Nasridinov, Aziz
    • Journal of Multimedia Information System
    • /
    • 제6권1호
    • /
    • pp.37-42
    • /
    • 2019
  • Skyline queries are used in a variety of fields to make optimal decisions. However, as the volume of data and the dimension of the data increase, the number of skyline points increases with the amount of time it takes to discover them. Mainly, because the number of skylines is essential in many real-life applications, various studies have been proposed. However, previous researches have used the k-parameter methods such as top-k and k-means to discover representative skyline points (RSPs) from entire skyline point set, resulting in high query response time and reduced representativeness due to k dependency. To solve this problem, we propose a new Connected Component Clustering based Representative Skyline Query (3CRS) that can discover RSP quickly even in high-dimensional data through connected component clustering. 3CRS performs fast discovery and clustering of skylines through hash indexes and connected components and selects RSPs from each cluster. This paper proves the superiority of the proposed method by comparing it with representative skyline queries using k-means and DBSCAN with the real-world dataset.

Data Clustering Method Using a Modified Gaussian Kernel Metric and Kernel PCA

  • Lee, Hansung;Yoo, Jang-Hee;Park, Daihee
    • ETRI Journal
    • /
    • 제36권3호
    • /
    • pp.333-342
    • /
    • 2014
  • Most hyper-ellipsoidal clustering (HEC) approaches use the Mahalanobis distance as a distance metric. It has been proven that HEC, under this condition, cannot be realized since the cost function of partitional clustering is a constant. We demonstrate that HEC with a modified Gaussian kernel metric can be interpreted as a problem of finding condensed ellipsoidal clusters (with respect to the volumes and densities of the clusters) and propose a practical HEC algorithm that is able to efficiently handle clusters that are ellipsoidal in shape and that are of different size and density. We then try to refine the HEC algorithm by utilizing ellipsoids defined on the kernel feature space to deal with more complex-shaped clusters. The proposed methods lead to a significant improvement in the clustering results over K-means algorithm, fuzzy C-means algorithm, GMM-EM algorithm, and HEC algorithm based on minimum-volume ellipsoids using Mahalanobis distance.

개선된 PSO방법에 의한 학술연구조성사업 논문의 효과적인 분류 방법과 그 효과성에 관한 실증분석 (An Empirical Analysis Approach to Investigating Effectiveness of the PSO-based Clustering Method for Scholarly Papers Supported by the Research Grant Projects)

  • 이건창;서영욱;이대성
    • 지식경영연구
    • /
    • 제10권4호
    • /
    • pp.17-30
    • /
    • 2009
  • This study is concerned with suggesting a new clustering algorithm to evaluate the value of papers which were supported by research grants by Korea Research Fund (KRF). The algorithm is based on an extended version of a conventional PSO (Particle Swarm Optimization) mechanism. In other words, the proposed algorithm is based on integration of k-means algorithm and simulated annealing mechanism, named KASA-PSO. To evaluate the robustness of KASA-PSO, its clustering results are evaluated by research grants experts working at KRF. Empirical results revealed that the proposed KASA-PSO clustering method shows improved results than conventional clustering method.

  • PDF

클러스터링 성능평가: 신경망 및 통계적 방법 (A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method)

  • 윤석환;신용백
    • 기술사
    • /
    • 제29권2호
    • /
    • pp.71-79
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Loaming vector Quantization) for a neural method and the k -means algorithm for a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k -means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF

클러스터 중심 결정 방법에 따른 문서 클러스터링 성능 분석 (Analysis of Document Clustering Varing Cluster Centroid Decisions)

  • 오형진;변동률;이신원;박순철;정성종;안동언
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(3)
    • /
    • pp.99-102
    • /
    • 2002
  • K-means clustering algorithm is a very popular clustering technique, which is used in the field of information retrieval. In this paper, We deal with the problem of K-means Algorithm from the view of creating the centroids and suggest a method reflecting document feature and considering the context of each document to determine the new centroids during the process of forming new centroids. For experiment, We used the automatic document summarizer to summarize the Reuter21578 newslire test dataset and achieved 20% improved results to the recall metrics.

  • PDF

Fuzzy c-Means Clustering Algorithm with Pseudo Mahalanobis Distances

  • ICHIHASHI, Hidetomo;OHUE, Masayuki;MIYOSHI, Tetsuya
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1998년도 The Third Asian Fuzzy Systems Symposium
    • /
    • pp.148-152
    • /
    • 1998
  • Gustafson and Kessel proposed a modified fuzzy c-Means algorithm based of the Mahalanobis distance. Though the algorithm appears more natural through the use of a fuzzy covariance matrix, it needs to calculate determinants and inverses of the c-fuzzy scatter matrices. This paper proposes a fuzzy clustering algorithm using pseudo mahalanobis distance, which is more easy to use and flexible than the Gustafson and Kessel's fuzzy c-Means.

  • PDF

Prediction and visualization of CYP2D6 genotype-based phenotype using clustering algorithms

  • Kim, Eun-Young;Shin, Sang-Goo;Shin, Jae-Gook
    • Translational and Clinical Pharmacology
    • /
    • 제25권3호
    • /
    • pp.147-152
    • /
    • 2017
  • This study focused on the role of cytochrome P450 2D6 (CYP2D6) genotypes to predict phenotypes in the metabolism of dextromethorphan. CYP2D6 genotypes and metabolic ratios (MRs) of dextromethorphan were determined in 201 Koreans. Unsupervised clustering algorithms, hierarchical and k-means clustering analysis, and color visualizations of CYP2D6 activity were performed on a subset of 130 subjects. A total of 23 different genotypes were identified, five of which were observed in one subject. Phenotype classifications were based on the means, medians, and standard deviations of the log MR values for each genotype. Color visualization was used to display the mean and median of each genotype as different color intensities. Cutoff values were determined using receiver operating characteristic curves from the k-means analysis, and the data were validated in the remaining subset of 71 subjects. Using the two highest silhouette values, the selected numbers of clusters were three (the best) and four. The findings from the two clustering algorithms were similar to those of other studies, classifying $^*5/^*5$ as a lowest activity group and genotypes containing duplicated alleles (i.e., $CYP2D6^*1/^*2N$) as a highest activity group. The validation of the k-means clustering results with data from the 71 subjects revealed relatively high concordance rates: 92.8% and 73.9% in three and four clusters, respectively. Additionally, color visualization allowed for rapid interpretation of results. Although the clustering approach to predict CYP2D6 phenotype from CYP2D6 genotype is not fully complete, it provides general information about the genotype to phenotype relationship, including rare genotypes with only one subject.

K-shape 군집화 기반 블랙-리터만 포트폴리오 구성 (Black-Litterman Portfolio with K-shape Clustering)

  • 김예지;조풍진
    • 산업경영시스템학회지
    • /
    • 제46권4호
    • /
    • pp.63-73
    • /
    • 2023
  • This study explores modern portfolio theory by integrating the Black-Litterman portfolio with time-series clustering, specificially emphasizing K-shape clustering methodology. K-shape clustering enables grouping time-series data effectively, enhancing the ability to plan and manage investments in stock markets when combined with the Black-Litterman portfolio. Based on the patterns of stock markets, the objective is to understand the relationship between past market data and planning future investment strategies through backtesting. Additionally, by examining diverse learning and investment periods, it is identified optimal strategies to boost portfolio returns while efficiently managing associated risks. For comparative analysis, traditional Markowitz portfolio is also assessed in conjunction with clustering techniques utilizing K-Means and K-Means with Dynamic Time Warping. It is suggested that the combination of K-shape and the Black-Litterman model significantly enhances portfolio optimization in the stock market, providing valuable insights for making stable portfolio investment decisions. The achieved sharpe ratio of 0.722 indicates a significantly higher performance when compared to other benchmarks, underlining the effectiveness of the K-shape and Black-Litterman integration in portfolio optimization.

클러스터 중심 왜곡 저감을 위한 클러스터링 기법 (Clustering Method for Reduction of Cluster Center Distortion)

  • 정혜천;서석태;이인근;권순학
    • 한국지능시스템학회논문지
    • /
    • 제18권3호
    • /
    • pp.354-359
    • /
    • 2008
  • 클러스터링은 주어진 임의의 데이터 중에서 유사한 성질을 지닌 데이터를 복수개의 그룹으로 조직화하는 기법이다. 이를 위해 K-Means, Fuzzy C-Means(FCM), Mountain Method(MM) 등과 같은 많은 기법들이 제안되었고 또한 널리 사용되어지고 있다. 그러나 이러한 기법들은 초기값에 따라 클러스터링 결과가 크게 달라지는 단점이 있다. 특히 가장 널리 사용되는 FCM 기법은 잡음 데이터에 취약하며, 주어진 입력 데이터의 클러스터 내부분산을 최소화 하는 방법을 사용하기 때문에 클러스터링 중심의 왜곡 현상이 발생한다. 본 논문에서는 데이터 가중치에 근거한 비례적 근접데이터 병합을 통하여 클러스터 중심 왜곡을 저감하며 초기값에 영향을 받지 않는 클러스터링 기법을 제안한다. 그리고 FCM으로 얻어진 클러스터 중심과 제안기법을 적용하여 얻어진 클러스터 중심에 대한 비교 검토를 통하여 제안기법의 효용성을 확인한다.

카메라 획득 영상에서의 색 분산 및 개선된 K-means 색 병합을 이용한 텍스트 영역 추출 및 이진화 (Text Detection and Binarization using Color Variance and an Improved K-means Color Clustering in Camera-captured Images)

  • 송영자;최영우
    • 정보처리학회논문지B
    • /
    • 제13B권3호
    • /
    • pp.205-214
    • /
    • 2006
  • 이미지에 포함된 텍스트는 이미지의 내용을 함축적이고 구체적으로 표현하는 정보로서 이러한 정보를 실시간에 찾아내서 인식한다면 다양한 응용에 활용할 수 있다. 본 논문에서는 카메라로 취득한 다양한 종류의 이미지로부터 텍스트를 추출하는 방법과 추출된 영역에서 텍스트를 분리하는 방법을 새롭게 제안한다. 텍스트 영역 추출을 위해서 RGB 색 공간에서 색 분산을 특징으로 제안하며, 텍스트 영역 분리를 위해서 RGB 색 공간에서 개선된 K-means 병합을 제안한다. 실험은 디지털 카메라와 핸드폰 카메라로 취득한 다양한 종류의 문서유형 이미지와 실내외의 일반적인 자연이미지를 사용하였으며, ICDAR 콘테스트[1] 이미지의 일부도 사용하였다.