• Title/Summary/Keyword: Automated K-means clustering

Search Result 18, Processing Time 0.023 seconds

Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim, Sung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.1
    • /
    • pp.55-67
    • /
    • 2015
  • An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.

A Variable Selection Procedure for K-Means Clustering

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.471-483
    • /
    • 2012
  • One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".

Automated K-Means Clustering and R Implementation (자동화 K-평균 군집방법 및 R 구현)

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.723-733
    • /
    • 2009
  • The crucial problems of K-means clustering are deciding the number of clusters and initial centroids of clusters. Hence, the steps of K-means clustering are generally consisted of two-stage clustering procedure. The first stage is to run hierarchical clusters to obtain the number of clusters and cluster centroids and second stage is to run nonhierarchical K-means clustering using the results of first stage. Here we provide automated K-means clustering procedure to be useful to obtain initial centroids of clusters which can also be useful for large data sets, and provide software program implemented using R.

An Improved Automated Spectral Clustering Algorithm

  • Xiaodan Lv
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.185-199
    • /
    • 2024
  • In this paper, an improved automated spectral clustering (IASC) algorithm is proposed to address the limitations of the traditional spectral clustering (TSC) algorithm, particularly its inability to automatically determine the number of clusters. Firstly, a cluster number evaluation factor based on the optimal clustering principle is proposed. By iterating through different k values, the value corresponding to the largest evaluation factor was selected as the first-rank number of clusters. Secondly, the IASC algorithm adopts a density-sensitive distance to measure the similarity between the sample points. This rendered a high similarity to the data distributed in the same high-density area. Thirdly, to improve clustering accuracy, the IASC algorithm uses the cosine angle classification method instead of K-means to classify the eigenvectors. Six algorithms-K-means, fuzzy C-means, TSC, EIGENGAP, DBSCAN, and density peak-were compared with the proposed algorithm on six datasets. The results show that the IASC algorithm not only automatically determines the number of clusters but also obtains better clustering accuracy on both synthetic and UCI datasets.

Corrosion Image Monitoring of steel plate by using k-means clustering (k-means 클러스터링을 이용한 강판의 부식 이미지 모니터링)

  • Kim, Beomsoo;Kwon, Jaesung;Choi, Sungwoong;Noh, Jungpil;Lee, Kyunghwang;Yang, Jeonghyeon
    • Journal of the Korean institute of surface engineering
    • /
    • v.54 no.5
    • /
    • pp.278-284
    • /
    • 2021
  • Corrosion of steel plate is common phenomenon which results in the gradual destruction caused by a wide variety of environments. Corrosion monitoring is the tracking of the degradation progress for a long period of time. Corrosion on steel plate appears as a discoloration and any irregularities on the surface. In this study, we developed a quantitative evaluation method of the rust formed on steel plate by using k-means clustering from the corroded area in a given image. The k-means clustering for automated corrosion detection was based on the GrabCut segmentation and Gaussian mixture model(GMM). Image color of the corroded surface at cut-edge area was analyzed quantitatively based on HSV(Hue, Saturation, Value) color space.

Classification Methods for Automated Prediction of Power Load Patterns (전력 부하 패턴 자동 예측을 위한 분류 기법)

  • Minghao, Piao;Park, Jin-Hyung;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.26-30
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed our approach consists of three stages: (i) data pre-processing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.

  • PDF

Classification of Fuzzy Logic on the Optimized Bead Geometry in the Gas Metal Arc Welding

  • Yu Xue;Kim, Ill-Soo;Park, Chang-Eun;Kim, In-Ju;Son, Joon-Sik
    • Proceedings of the Korean Society of Machine Tool Engineers Conference
    • /
    • 2004.10a
    • /
    • pp.225-232
    • /
    • 2004
  • Recently, there has been a rapid development in computer technology, which has in turn led to develop the automated welding system using Artificial Intelligence (AI). However, the automated welding system has not been achieved duo to difficulties of the control and sensor technologies. In this paper, the classification of the optimized bead geometry such as bead width, height penetration and bead area in the Gas Metal Arc (GMA) welding with fuzzy logic is presented. The fuzzy C-Means algorithm (FCM), which is best known an unsupervised fuzzy clustering algorithm is employed here to analysis the specimen of the bead geometry. Then the quality of the GMA welding can be classified by this fuzzy clustering technique and the choice for obtaining the optimal bead geometry can also be determined.

  • PDF

An Optimized Partner Searching System for B2B Marketplace Applying Clustering Techniques (군집화 기법을 이용한 B2B Marketplace상의 최적 파트너 검색 시스템)

  • Kim Shin-Young;Kim Soo-Young
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.572-579
    • /
    • 2003
  • With the expansion of e-commerce, E-marketplace has become one of the most discussed topics in recent years. Limited theoretical works, however, have been done to optimize the practical use of e-marketplace systems. Other potential issues aside, this research has focused on this problem: 'the participants waste too much time, effort and cost to find out their best partner in B2B marketplace.' To solve this problem, this paper proposes a system which provides the user-company with the automated and customized brokering service. The system proposed in this paper assesses the weight on the priorities of a user-company, runs the two-stage clustering algorithm with self-organizing map and K-means clustering technique. Subsequently, the system shows the clustering result and user guide-line. This system enables B2B marketplace to have more efficiency on transaction with smaller pool of partners to be searched.

  • PDF

A Study on Labeling of ECG Signal using Fuzzy Clustering (퍼지 클러스터링을 이용한 심전도 신호의 라벨링에 관한 연구)

  • Kong, I.W.;Lee, J.W.;Lee, S.H.;Choi, S.J.;Lee, M.H.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1996 no.11
    • /
    • pp.118-121
    • /
    • 1996
  • This paper describes ECG signal labeling based on Fuzzy clustering, which is necessary at automated ECG diagnosis. The NPPA(Non parametric partitioning algorithm) compares the correlations of wave forms, which tends to recognize the same wave forms as different when the wave forms have a little morphological variation. We propose to apply Fuzzy clustering to ECG QRS Complex labeling, which prevents the errors to mistake by using If-then comparision. The process is divided into two parts. The first part is a parameters extraction process from ECG signal, which is composed of filtering, QRS detection by mapping to a phase space by time delay coordinates and generation of characteristic vectors. The second is fuzzy clustering by FCM(Fuzzy c-means), which is composed of a clustering, an assessment of cluster validity and labeling.

  • PDF

Improved Algorithm for Fully-automated Neural Spike Sorting based on Projection Pursuit and Gaussian Mixture Model

  • Kim, Kyung-Hwan
    • International Journal of Control, Automation, and Systems
    • /
    • v.4 no.6
    • /
    • pp.705-713
    • /
    • 2006
  • For the analysis of multiunit extracellular neural signals as multiple spike trains, neural spike sorting is essential. Existing algorithms for the spike sorting have been unsatisfactory when the signal-to-noise ratio(SNR) is low, especially for implementation of fully-automated systems. We present a novel method that shows satisfactory performance even under low SNR, and compare its performance with a recent method based on principal component analysis(PCA) and fuzzy c-means(FCM) clustering algorithm. Our system consists of a spike detector that shows high performance under low SNR, a feature extractor that utilizes projection pursuit based on negentropy maximization, and an unsupervised classifier based on Gaussian mixture model. It is shown that the proposed feature extractor gives better performance compared to the PCA, and the proposed combination of spike detector, feature extraction, and unsupervised classification yields much better performance than the PCA-FCM, in that the realization of fully-automated unsupervised spike sorting becomes more feasible.