• Title/Summary/Keyword: K means clustering

Search Result 1,111, Processing Time 0.029 seconds

Curriculum Mining Analysis Using Clustering-Based Process Mining (군집화 기반 프로세스 마이닝을 이용한 커리큘럼 마이닝 분석)

  • Joo, Woo-Min;Choi, Jin Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.45-55
    • /
    • 2015
  • In this paper, we consider curriculum mining as an application of process mining in the domain of education. The basic objective of the curriculum mining is to construct a registration pattern model by using logs of registration data. However, subject registration patterns of students are very unstructured and complicated, called a spaghetti model, because it has a lot of different cases and high diversity of behaviors. In general, it is typically difficult to develop and analyze registration patterns. In the literature, there was an effort to handle this issue by using clustering based on the features of students and behaviors. However, it is not easy to obtain them in general since they are private and qualitative. Therefore, in this paper, we propose a new framework of curriculum mining applying K-means clustering based on subject attributes to solve the problems caused by unstructured process model obtained. Specifically, we divide subject's attribute data into two parts : categorical and numerical data. Categorical attribute has subject name, class classification, and research field, while numerical attribute has ABEEK goal and semester information. In case of categorical attribute, we suggest a method to quantify them by using binarization. The number of clusters used for K-means clustering, we applied Elbow method using R-squared value representing the variance ratio that can be explained by the number of clusters. The performance of the suggested method was verified by using a log of student registration data from an 'A university' in terms of the simplicity and fitness, which are the typical performance measure of obtained process model in process mining.

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF

Analysis on the Distribution of RF Threats Using Unsupervised Learning Techniques (비지도 학습 기법을 사용한 RF 위협의 분포 분석)

  • Kim, Chulpyo;Noh, Sanguk;Park, So Ryoung
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.19 no.3
    • /
    • pp.346-355
    • /
    • 2016
  • In this paper, we propose a method to analyze the clusters of RF threats emitting electrical signals based on collected signal variables in integrated electronic warfare environments. We first analyze the signal variables collected by an electronic warfare receiver, and construct a model based on variables showing the properties of threats. To visualize the distribution of RF threats and reversely identify them, we use k-means clustering algorithm and self-organizing map (SOM) algorithm, which are belonging to unsupervised learning techniques. Through the resulting model compiled by k-means clustering and SOM algorithms, the RF threats can be classified into one of the distribution of RF threats. In an experiment, we measure the accuracy of classification results using the algorithms, and verify the resulting model that could be used to visually recognize the distribution of RF threats.

Design and Implementation of Distributed In-Memory DBMS-based Parallel K-Means as In-database Analytics Function (분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현)

  • Kou, Heymo;Nam, Changmin;Lee, Woohyun;Lee, Yongjae;Kim, HyoungJoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2018
  • As data size increase, a single database is not enough to serve current volume of tasks. Since data is partitioned and stored into multiple databases, analysis should also support parallelism in order to increase efficiency. However, traditional analysis requires data to be transferred out of database into nodes where analytic service is performed and user is required to know both database and analytic framework. In this paper, we propose an efficient way to perform K-means clustering algorithm inside the distributed column-based database and relational database. We also suggest an efficient way to optimize K-means algorithm within relational database.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Clustering In Tied Mixture HMM Using Homogeneous Centroid Neural Network (Homogeneous Centroid Neural Network에 의한 Tied Mixture HMM의 군집화)

  • Park Dong-Chul;Kim Woo-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.9C
    • /
    • pp.853-858
    • /
    • 2006
  • TMHMM(Tied Mixture Hidden Markov Model) is an important approach to reduce the number of free parameters in speech recognition. However, this model suffers from a degradation in recognition accuracy due to its GPDF (Gaussian Probability Density Function) clustering error. This paper proposes a clustering algorithm, called HCNN(Homogeneous Centroid Neural network), to cluster acoustic feature vectors in TMHMM. Moreover, the HCNN uses the heterogeneous distance measure to allocate more code vectors in the heterogeneous areas where probability densities of different states overlap each other. When applied to Korean digit isolated word recognition, the HCNN reduces the error rate by 9.39% over CNN clustering, and 14.63% over the traditional K-means clustering.

Design of Fuzzy Neural Networks Based on Fuzzy Clustering and Its Application (퍼지 클러스터링 기반 퍼지뉴럴네트워크 설계 및 적용)

  • Park, Keon-Jun;Lee, Dong-Yoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.1
    • /
    • pp.378-384
    • /
    • 2013
  • In this paper, we propose the fuzzy neural networks based on fuzzy c-means clustering algorithm. Typically, the generation of fuzzy rules have the problem that the number of fuzzy rules exponentially increases when the dimension increases. To solve this problem, the fuzzy rules of the proposed networks are generated by partitioning the input space in the scatter form using FCM clustering algorithm. The premise parameters of the fuzzy rules are determined by membership matrix by means of FCM clustering algorithm. The consequence part of the rules is expressed in the form of polynomial functions and the learning of fuzzy neural networks is realized by adjusting connections of the neurons, and it follows a back-propagation algorithm. The proposed networks are evaluated through the application to nonlinear process.

Corrosion image analysis on galvanized steel by using superpixel DBSCAN clustering algorithm (슈퍼픽셀 DBSCAN 군집 알고리즘을 이용한 용융아연도금 강판의 부식이미지 분석)

  • Kim, Beomsoo;Kim, Yeonwon;Lee, Kyunghwang;Yang, Jeonghyeon
    • Journal of the Korean institute of surface engineering
    • /
    • v.55 no.3
    • /
    • pp.164-172
    • /
    • 2022
  • Hot-dip galvanized steel(GI) is widely used throughout the industry as a corrosion resistance material. Corrosion of steel is a common phenomenon that results in the gradual degradation under various environmental conditions. Corrosion monitoring is to track the degradation progress for a long time. Corrosion on steel plate appears as discoloration and any irregularities on the surface. This study developed a quantitative evaluation method of the rust formed on GI steel plate using a superpixel-based DBSCAN clustering method and k-means clustering from the corroded area in a given image. The superpixel-based DBSCAN clustering method decrease computational costs, reaching automatic segmentation. The image color of the rusty surface was analyzed quantitatively based on HSV(Hue, Saturation, Value) color space. In addition, two segmentation methods are compared for the particular spatial region using their histograms.

Creating a Smartphone User Recommendation System Using Clustering (클러스터링을 이용한 스마트폰 사용자 추천 시스템 만들기)

  • Jin Hyoung AN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.1-6
    • /
    • 2024
  • In this paper, we develop an AI-based recommendation system that matches the specifications of smartphones from company 'S'. The system aims to simplify the complex decision-making process of consumers and guide them to choose the smartphone that best suits their daily needs. The recommendation system analyzes five specifications of smartphones (price, battery capacity, weight, camera quality, capacity) to help users make informed decisions without searching for extensive information. This approach not only saves time but also improves user satisfaction by ensuring that the selected smartphone closely matches the user's lifestyle and needs. The system utilizes unsupervised learning, i.e. clustering (K-MEANS, DBSCAN, Hierarchical Clustering), and provides personalized recommendations by evaluating them with silhouette scores, ensuring accurate and reliable grouping of similar smartphone models. By leveraging advanced data analysis techniques, the system can identify subtle patterns and preferences that might not be immediately apparent to consumers, enhancing the overall user experience. The ultimate goal of this AI recommendation system is to simplify the smartphone selection process, making it more accessible and user-friendly for all consumers. This paper discusses the data collection, preprocessing, development, implementation, and potential impact of the system using Pandas, crawling, scikit-learn, etc., and highlights the benefits of helping consumers explore the various options available and confidently choose the smartphone that best suits their daily lives.