• Title/Summary/Keyword: K means clustering

Search Result 1,111, Processing Time 0.025 seconds

A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis (전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Young Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.465-472
    • /
    • 2017
  • In the past, researchers mainly used the supervised learning technique of machine learning to analyze power data and investigated the identification of patterns through the data mining technique. Data analysis research, however, faces its limitations with the old data classification and analysis techniques today when the size of electric power data has increased with the possible real-time provision of data. This study thus set out to propose a clustering architecture to analyze large-sized electric power data. The clustering process proposed in the study supplements the K-means algorithm, an unsupervised learning technique, for its problems and is capable of automating the entire process from the collection of electric power data to their analysis. In the present study, power data were categorized and analyzed in total three levels, which include the row data level, clustering level, and user interface level. In addition, the investigator identified K, the ideal number of clusters, based on principal component analysis and normal distribution and proposed an altered K-means algorithm to reduce data that would be categorized as ideal points in order to increase the efficiency of clustering.

Tree-structured Clustering for Mixed Data (혼합형 데이터에 대한 나무형 군집화)

  • Yang Kyung-Sook;Huh Myung-Hoe
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.271-282
    • /
    • 2006
  • The aim of this study is to propose a tree-structured clustering for mixed data. We suggest a scaling method to reduce the variable selection bias among categorical variables. In numerical examples such as credit data, German credit data, we note several differences between tree-structured clustering and K-means clustering.

An Efficient Clustering Algorithm based on Heuristic Evolution (휴리스틱 진화에 기반한 효율적 클러스터링 알고리즘)

  • Ryu, Joung-Woo;Kang, Myung-Ku;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.80-90
    • /
    • 2002
  • Clustering is a useful technique for grouping data points such that points within a single group/cluster have similar characteristics. Many clustering algorithms have been developed and used in engineering applications including pattern recognition and image processing etc. Recently, it has drawn increasing attention as one of important techniques in data mining. However, clustering algorithms such as K-means and Fuzzy C-means suffer from difficulties. Those are the needs to determine the number of clusters apriori and the clustering results depending on the initial set of clusters which fails to gain desirable results. In this paper, we propose a new clustering algorithm, which solves mentioned problems. In our method we use evolutionary algorithm to solve the local optima problem that clustering converges to an undesirable state starting with an inappropriate set of clusters. We also adopt a new measure that represents how well data are clustered. The measure is determined in terms of both intra-cluster dispersion and inter-cluster separability. Using the measure, in our method the number of clusters is automatically determined as the result of optimization process. And also, we combine heuristic that is problem-specific knowledge with a evolutionary algorithm to speed evolutionary algorithm search. We have experimented our algorithm with several sets of multi-dimensional data and it has been shown that one algorithm outperforms the existing algorithms.

Analysis of Chicken Feather Color Phenotypes Classified by K-Means Clustering using Reciprocal F2 Chicken Populations (K-Means Clustering으로 분류한 닭 깃털색 표현형의 분석)

  • Park, Jongho;Heo, Seonyeong;Kim, Minjun;Cho, Eunjin;Cha, Jihye;Jin, Daehyeok;Koh, Yeong Jun;Lee, Seung-Hwan;Lee, Jun Heon
    • Korean Journal of Poultry Science
    • /
    • v.49 no.3
    • /
    • pp.157-165
    • /
    • 2022
  • Chickens are a species of vertebrate with varying colors. Various colors of chickens must be classified to find color-related genes. In the past, color scoring was performed based on human visual observation. Therefore, chicken colors have not been measured with precise standards. In order to solve this problem, a computer vision approach was used in this study. Image quantization based on k-means clustering for all pixels of RGB values can objectively distinguish inherited colors that are expressed in various ways. This study was also conducted to determine whether plumage color differences exist in the reciprocal cross lines between two breeds: black Yeonsan Ogye (YO) and White Leghorn (WL). Line B is a crossbred line between YO males and WL females while Line L is a reciprocal crossbred line between WL males and YO females. One male and ten females were selected for each F1 line, and full-sib mating was conducted to generate 883 F2 birds. The results indicate that the distribution of light and dark colors of k-means clustering converged to 7:3. Additionally, the color of Line B was lighter than that of Line L (P<0.01). This study suggests that the genes underlying plumage colors can be identified using quantification values from the computer vision approach described in this study.

K-Means Clustering with Deep Learning for Fingerprint Class Type Prediction

  • Mukoya, Esther;Rimiru, Richard;Kimwele, Michael;Mashava, Destine
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.29-36
    • /
    • 2022
  • In deep learning classification tasks, most models frequently assume that all labels are available for the training datasets. As such strategies to learn new concepts from unlabeled datasets are scarce. In fingerprint classification tasks, most of the fingerprint datasets are labelled using the subject/individual and fingerprint datasets labelled with finger type classes are scarce. In this paper, authors have developed approaches of classifying fingerprint images using the majorly known fingerprint classes. Our study provides a flexible method to learn new classes of fingerprints. Our classifier model combines both the clustering technique and use of deep learning to cluster and hence label the fingerprint images into appropriate classes. The K means clustering strategy explores the label uncertainty and high-density regions from unlabeled data to be clustered. Using similarity index, five clusters are created. Deep learning is then used to train a model using a publicly known fingerprint dataset with known finger class types. A prediction technique is then employed to predict the classes of the clusters from the trained model. Our proposed model is better and has less computational costs in learning new classes and hence significantly saving on labelling costs of fingerprint images.

Fiscal Policy Effectiveness Assessment Based on Cluster Analysis of Regions

  • Martynenko, Valentyna;Kovalenko, Yuliia;Chunytska, Iryna;Paliukh, Oleksandr;Skoryk, Maryna;Plets, Ivan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.75-84
    • /
    • 2022
  • The efficiency of the regional fiscal policy implementation is based on the achievement of target criteria in the formation and distribution of own financial resources of local budgets, reducing their deficit and reducing dependence on transfers. It is also relevant to compare the development of financial autonomy of regions in the course of decentralisation of fiscal relations. The study consists in the cluster analysis of the effectiveness of fiscal policy implementation in the context of 24 regions and the capital city of Kyiv (except for temporarily occupied territories) under conditions of fiscal decentralisation. Clustering of the regions of Ukraine by 18 indicators of fiscal policy implementation efficiency was carried out using Ward's minimum variance method and k-means clustering algorithm. As a result, the regions of Ukraine are grouped into 5 homogeneous clusters. For each cluster measures were developed to increase own revenues and minimize dependence on official transfers to increase the level of financial autonomy of the regions. It has been proved that clustering algorithms are an effective tool in assessing the effectiveness of fiscal policy implementation at the regional level and stimulating further expansion of financial decentralisation of regions.

Nucleus Recognition of Uterine Cervical Pap-Smears using FCM Clustering Algorithm

  • Kim, Kwang-Baek
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.1
    • /
    • pp.94-99
    • /
    • 2008
  • Segmentation for the region of nucleus in the image of uterine cervical cytodiagnosis is known as the most difficult and important part in the automatic cervical cancer recognition system. In this paper, the region of nucleus is extracted from an image of uterine cervical cytodiagnosis using the HSI model. The characteristics of the nucleus are extracted from the analysis of morphemetric features, densitometric features, colormetric features, and textural features based on the detected region of nucleus area. The classification criterion of a nucleus is defined according to the standard categories of the Bethesda system. The fuzzy C-means clustering algorithm is employed to the extracted nucleus and the results show that the proposed method is efficient in nucleus recognition and uterine cervical Pap-Smears extraction.

Non-Keyword Model for the Improvement of Vocabulary Independent Keyword Spotting System (가변어휘 핵심어 검출 성능 향상을 위한 비핵심어 모델)

  • Kim, Min-Je;Lee, Jung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.319-324
    • /
    • 2006
  • We Propose two new methods for non-keyword modeling to improve the performance of speaker- and vocabulary-independent keyword spotting system. The first method is decision tree clustering of monophone at the state level instead of monophone clustering method based on K-means algorithm. The second method is multi-state multiple mixture modeling at the syllable level rather than single state multiple mixture model for the non-keyword. To evaluate our method, we used the ETRI speech DB for training and keyword spotting test (closed test) . We also conduct an open test to spot 100 keywords with 400 sentences uttered by 4 speakers in an of fce environment. The experimental results showed that the decision tree-based state clustering method improve 28%/29% (closed/open test) than the monophone clustering method based K-means algorithm in keyword spotting. And multi-state non-keyword modeling at the syllable level improve 22%/2% (closed/open test) than single state model for the non-keyword. These results show that two proposed methods achieve the improvement of keyword spotting performance.

Characterization of Premature Ventricular Contraction by K-Means Clustering Learning Algorithm with Mean-Reverting Heart Rate Variability Analysis (평균회귀 심박변이도의 K-평균 군집화 학습을 통한 심실조기수축 부정맥 신호의 특성분석)

  • Kim, Jeong-Hwan;Kim, Dong-Jun;Lee, Jeong-Whan;Kim, Kyeong-Seop
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.7
    • /
    • pp.1072-1077
    • /
    • 2017
  • Mean-reverting analysis refers to a way of estimating the underlining tendency after new data has evoked the variation in the equilibrium state. In this paper, we propose a new method to interpret the specular portraits of Premature Ventricular Contraction(PVC) arrhythmia by applying K-means unsupervised learning algorithm on electrocardiogram(ECG) data. Aiming at this purpose, we applied a mean-reverting model to analyse Heart Rate Variability(HRV) in terms of the modified poincare plot by considering PVC rhythm as the component of disrupting the homeostasis state. Based on our experimental tests on MIT-BIH ECG database, we can find the fact that the specular patterns portraited by K-means clustering on mean-reverting HRV data can be more clearly visible and the Euclidean metric can be used to identify the discrepancy between the normal sinus rhythm and PVC beats by the relative distance among cluster-centroids.