• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 548, Processing Time 0.025 seconds

Facial Expression Recognition with Fuzzy C-Means Clusstering Algorithm and Neural Network Based on Gabor Wavelets

  • Youngsuk Shin;Chansup Chung;Lee, Yillbyung
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.126-132
    • /
    • 2000
  • This paper presents a facial expression recognition based on Gabor wavelets that uses a fuzzy C-means(FCM) clustering algorithm and neural network. Features of facial expressions are extracted to two steps. In the first step, Gabor wavelet representation can provide edges extraction of major face components using the average value of the image's 2-D Gabor wavelet coefficient histogram. In the next step, we extract sparse features of facial expressions from the extracted edge information using FCM clustering algorithm. The result of facial expression recognition is compared with dimensional values of internal stated derived from semantic ratings of words related to emotion. The dimensional model can recognize not only six facial expressions related to Ekman's basic emotions, but also expressions of various internal states.

  • PDF

An Incremental Web Document Clustering Based on the Transitive Closure Tree (이행적 폐쇄트리를 기반으로 한 점증적 웹 문서 클러스터링)

  • Youn Sung-Dae;Ko Suc-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.1
    • /
    • pp.1-10
    • /
    • 2006
  • In document clustering methods, the k-means algorithm and the Hierarchical Alglomerative Clustering(HAC) are often used. The k-means algorithm has the advantage of a processing time and HAC has also the advantage of a precision of classification. But both methods have mutual drawbacks, a slow processing time and a low quality of classification for the k-means algorithm and the HAC, respectively. Also both methods have the serious problem which is to compute a document similarity whenever new document is inserted into a cluster. A main property of web resource is to accumulate an information by adding new documents frequently. Therefore, we propose a new method of transitive closure tree based on the HAC method which can improve a processing time for a document clustering, and also propose a superior incremental clustering method for an insertion of a new document and a deletion of a document contained in a cluster. The proposed method is compared with those existing algorithms on the basis of a pre챠sion, a recall, a F-Measure, and a processing time and we present the experimental results.

  • PDF

Automatic Switching of Clustering Methods based on Fuzzy Inference in Bibliographic Big Data Retrieval System

  • Zolkepli, Maslina;Dong, Fangyan;Hirota, Kaoru
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.256-267
    • /
    • 2014
  • An automatic switch among ensembles of clustering algorithms is proposed as a part of the bibliographic big data retrieval system by utilizing a fuzzy inference engine as a decision support tool to select the fastest performing clustering algorithm between fuzzy C-means (FCM) clustering, Newman-Girvan clustering, and the combination of both. It aims to realize the best clustering performance with the reduction of computational complexity from O($n^3$) to O(n). The automatic switch is developed by using fuzzy logic controller written in Java and accepts 3 inputs from each clustering result, i.e., number of clusters, number of vertices, and time taken to complete the clustering process. The experimental results on PC (Intel Core i5-3210M at 2.50 GHz) demonstrates that the combination of both clustering algorithms is selected as the best performing algorithm in 20 out of 27 cases with the highest percentage of 83.99%, completed in 161 seconds. The self-adapted FCM is selected as the best performing algorithm in 4 cases and the Newman-Girvan is selected in 3 cases.The automatic switch is to be incorporated into the bibliographic big data retrieval system that focuses on visualization of fuzzy relationship using hybrid approach combining FCM and Newman-Girvan algorithm, and is planning to be released to the public through the Internet.

Approximate k values using Repulsive Force without Domain Knowledge in k-means

  • Kim, Jung-Jae;Ryu, Minwoo;Cha, Si-Ho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.976-990
    • /
    • 2020
  • The k-means algorithm is widely used in academia and industry due to easy and simple implementation, enabling fast learning for complex datasets. However, k-means struggles to classify datasets without prior knowledge of specific domains. We proposed the repulsive k-means (RK-means) algorithm in a previous study to improve the k-means algorithm, using the repulsive force concept, which allows deleting unnecessary cluster centroids. Accordingly, the RK-means enables to classifying of a dataset without domain knowledge. However, three main problems remain. The RK-means algorithm includes a cluster repulsive force offset, for clusters confined in other clusters, which can cause cluster locking; we were unable to prove RK-means provided optimal convergence in the previous study; and RK-means shown better performance only normalize term and weight. Therefore, this paper proposes the advanced RK-means (ARK-means) algorithm to resolve the RK-means problems. We establish an initialization strategy for deploying cluster centroids and define a metric for the ARK-means algorithm. Finally, we redefine the mass and normalize terms to close to the general dataset. We show ARK-means feasibility experimentally using blob and iris datasets. Experiment results verify the proposed ARK-means algorithm provides better performance than k-means, k'-means, and RK-means.

Clustering of Decision Making Units using DEA (DEA를 이용한 의사결정단위의 클러스터링)

  • Kim, Kyeongtaek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.37 no.4
    • /
    • pp.239-244
    • /
    • 2014
  • The conventional clustering approaches are mostly based on minimizing total dissimilarity of input and output. However, the clustering approach may not be helpful in some cases of clustering decision making units (DMUs) with production feature converting multiple inputs into multiple outputs because it does not care converting functions. Data envelopment analysis (DEA) has been widely applied for efficiency estimation of such DMUs since it has non-parametric characteristics. We propose a new clustering method to identify groups of DMUs that are similar in terms of their input-output profiles. A real world example is given to explain the use and effectiveness of the proposed method. And we calculate similarity value between its result and the result of a conventional clustering method applied to the example. After the efficiency value was added to input of K-means algorithm, we calculate new similarity value and compare it with the previous one.

An Enhanced Spatial Fuzzy C-Means Algorithm for Image Segmentation (영상 분할을 위한 개선된 공간적 퍼지 클러스터링 알고리즘)

  • Truong, Tung X.;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.49-57
    • /
    • 2012
  • Conventional fuzzy c-means (FCM) algorithms have achieved a good clustering performance. However, they do not fully utilize the spatial information in the image and this results in lower clustering performance for images that have low contrast, vague boundaries, and noises. To overcome this issue, we propose an enhanced spatial fuzzy c-means (ESFCM) algorithm that takes into account the influence of neighboring pixels on the center pixel by assigning weights to the neighbors in a $3{\times}3$ square window. To evaluate between the proposed ESFCM and various FCM based segmentation algorithms, we utilized clustering validity functions such as partition coefficient ($V_{pc}$), partition entropy ($V_{pe}$), and Xie-Bdni function ($V_{xb}$). Experimental results show that the proposed ESFCM outperforms other FCM based algorithms in terms of clustering validity functions.

A Study On Predicting Stock Prices Of Hallyu Content Companies Using Two-Stage k-Means Clustering (2단계 k-평균 군집화를 활용한 한류컨텐츠 기업 주가 예측 연구)

  • Kim, Jeong-Woo
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.169-179
    • /
    • 2021
  • This study shows that the two-stage k-means clustering method can improve prediction performance by predicting the stock price, To this end, this study introduces the two-stage k-means clustering algorithm and tests the prediction performance through comparison with various machine learning techniques. It selects the cluster close to the prediction target obtained from the k-means clustering, and reapplies the k-means clustering method to the cluster to search for a cluster closer to the actual value. As a result, the predicted value of this method is shown to be closer to the actual stock price than the predicted values of other machine learning techniques. Furthermore, it shows a relatively stable predicted value despite the use of a relatively small cluster. Accordingly, this method can simultaneously improve the accuracy and stability of prediction, and it can be considered as the new clustering method useful for small data. In the future, developing the two-stage k-means clustering is required for the large-scale data application.

Nonlinear Process Modeling Using Hard Partition-based Inference System (Hard 분산 분할 기반 추론 시스템을 이용한 비선형 공정 모델링)

  • Park, Keon-Jun;Kim, Yong-Kab
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.7 no.4
    • /
    • pp.151-158
    • /
    • 2014
  • In this paper, we introduce an inference system using hard scatter partition method and model the nonlinear process. To do this, we use the hard scatter partition method that partition the input space in the scatter form with the value of the membership degree of 0 or 1. The proposed method is implemented by C-Means clustering algorithm. and is used for the initial center values by means of binary split. by applying the LBG algorithm to compensate for shortcomings in the sensitive initial center value. Hard-scatter-partitioned input space forms the rules in the rule-based system modeling. The premise parameters of the rules are determined by membership matrix by means of C-Means clustering algorithm. The consequence part of the rules is expressed in the form of polynomial functions and the coefficient parameters of each rule are determined by the standard least-squares method. The data widely used in nonlinear process is used to model the nonlinear process and evaluate the characteristics of nonlinear process.

An Introduction of Two-Step K-means Clustering Applied to Microarray Data (마이크로 어레이 데이터에 적용된 2단계 K-means 클러스터링의 소개)

  • Park, Dae-Hoon;Kim, Youn-Tae;Kim, Sung-Shin;Lee, Choon-Hwan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.2
    • /
    • pp.167-172
    • /
    • 2007
  • Long gene sequences and their products have been studied by many methods. The use of DNA(Deoxyribonucleic acid) microarray technology has resulted in an enormous amount of data, which has been difficult to analyze using typical research methods. This paper proposes that mass data be analyzed using division clustering with the K-means clustering algorithm. To demonstrate the superiority of the proposed method, it was used to analyze the microarray data from rice DNA. The results were compared to those of the existing K-meansmethod establishing that the proposed method is more useful in spite of the effective reduction of performance time.

Clustering-based Monitoring and Fault detection in Hot Strip Roughing Mill (군집기반 열간조압연설비 상태모니터링과 진단)

  • SEO, MYUNG-KYO;YUN, WON YOUNG
    • Journal of Korean Society for Quality Management
    • /
    • v.45 no.1
    • /
    • pp.25-38
    • /
    • 2017
  • Purpose: Hot strip rolling mill consists of a lot of mechanical and electrical units. In condition monitoring and diagnosis phase, various units could be failed with unknown reasons. In this study, we propose an effective method to detect early the units with abnormal status to minimize system downtime. Methods: The early warning problem with various units is defined. K-means and PAM algorithm with Euclidean and Manhattan distances were performed to detect the abnormal status. In addition, an performance of the proposed algorithm is investigated by field data analysis. Results: PAM with Manhattan distance(PAM_ManD) showed better results than K-means algorithm with Euclidean distance(K-means_ED). In addition, we could know from multivariate field data analysis that the system reliability of hot strip rolling mill can be increased by detecting early abnormal status. Conclusion: In this paper, clustering-based monitoring and fault detection algorithm using Manhattan distance is proposed. Experiments are performed to study the benefit of the PAM with Manhattan distance against the K-means with Euclidean distance.