• Title/Summary/Keyword: K-means Clustering Analysis

Search Result 462, Processing Time 0.021 seconds

Curriculum Mining Analysis Using Clustering-Based Process Mining (군집화 기반 프로세스 마이닝을 이용한 커리큘럼 마이닝 분석)

  • Joo, Woo-Min;Choi, Jin Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.45-55
    • /
    • 2015
  • In this paper, we consider curriculum mining as an application of process mining in the domain of education. The basic objective of the curriculum mining is to construct a registration pattern model by using logs of registration data. However, subject registration patterns of students are very unstructured and complicated, called a spaghetti model, because it has a lot of different cases and high diversity of behaviors. In general, it is typically difficult to develop and analyze registration patterns. In the literature, there was an effort to handle this issue by using clustering based on the features of students and behaviors. However, it is not easy to obtain them in general since they are private and qualitative. Therefore, in this paper, we propose a new framework of curriculum mining applying K-means clustering based on subject attributes to solve the problems caused by unstructured process model obtained. Specifically, we divide subject's attribute data into two parts : categorical and numerical data. Categorical attribute has subject name, class classification, and research field, while numerical attribute has ABEEK goal and semester information. In case of categorical attribute, we suggest a method to quantify them by using binarization. The number of clusters used for K-means clustering, we applied Elbow method using R-squared value representing the variance ratio that can be explained by the number of clusters. The performance of the suggested method was verified by using a log of student registration data from an 'A university' in terms of the simplicity and fitness, which are the typical performance measure of obtained process model in process mining.

A Study on the Clustering method for Analysis of Zeus Botnet Attack Types in the Cloud Environment (클라우드 환경에서 제우스 Botnet 공격 유형 분석을 위한 클러스터링 방안 연구)

  • Bae, Won-il;Choi, Suk-June;Kim, Seong-Jin;Kim, Hyeong-Cheon;Kwak, Jin
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.11-20
    • /
    • 2017
  • Recently, developments in the various fields of cloud computing technology has been utilized. Whereas the demand for cloud computing services is increasing, security threats are also increasing in the cloud computing environments. Especially, in case when the hosts interconnected in the cloud environments are infected and propagated through the attacks by malware. It can have an effect on the resource of other hosts and other security threats such as personal information can be spreaded and data deletion. Therefore, the study of malware analysis to respond these security threats has been proceeded actively. This paper proposes a type of attack clustering method of Zeus botnet using the k-means clustering algorithm for malware analysis that occurs in the cloud environments. By clustering the malicious activity by a type of the Zeus botnet occurred in the cloud environments. it is possible to determine whether it is a malware or not. In the future, it sets a goal of responding to an attack of the new type of Zeus botnet that may occur in the cloud environments.

Design of Partial Discharge Pattern Classifier of Softmax Neural Networks Based on K-means Clustering : Comparative Studies and Analysis of Classifier Architecture (K-means 클러스터링 기반 소프트맥스 신경회로망 부분방전 패턴분류의 설계 : 분류기 구조의 비교연구 및 해석)

  • Jeong, Byeong-Jin;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.1
    • /
    • pp.114-123
    • /
    • 2018
  • This paper concerns a design and learning method of softmax function neural networks based on K-means clustering. The partial discharge data Information is preliminarily processed through simulation using an Epoxy Mica Coupling sensor and an internal Phase Resolved Partial Discharge Analysis algorithm. The obtained information is processed according to the characteristics of the pattern using a Motor Insulation Monitoring System program. At this time, the processed data are total 4 types that void discharge, corona discharge, surface discharge and slot discharge. The partial discharge data with high dimensional input variables are secondarily processed by principal component analysis method and reduced with keeping the characteristics of pattern as low dimensional input variables. And therefore, the pattern classifier processing speed exhibits improved effects. In addition, in the process of extracting the partial discharge data through the MIMS program, the magnitude of amplitude is divided into the maximum value and the average value, and two pattern characteristics are set and compared and analyzed. In the first half of the proposed partial discharge pattern classifier, the input and hidden layers are classified by using the K-means clustering method and the output of the hidden layer is obtained. In the latter part, the cross entropy error function is used for parameter learning between the hidden layer and the output layer. The final output layer is output as a normalized probability value between 0 and 1 using the softmax function. The advantage of using the softmax function is that it allows access and application of multiple class problems and stochastic interpretation. First of all, there is an advantage that one output value affects the remaining output value and its accompanying learning is accelerated. Also, to solve the overfitting problem, L2-normalization is applied. To prove the superiority of the proposed pattern classifier, we compare and analyze the classification rate with conventional radial basis function neural networks.

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.2
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.

Pitching grade index in Korean pro-baseball (한국프로야구에서의 투수평가지표)

  • Lee, Jang Taek
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.3
    • /
    • pp.485-492
    • /
    • 2014
  • In baseball, the traditional measure of pitchers are wins and ERA. But these statistics are influenced by luck or team power. So sabermetrician proposes a number of indicators that predict future performance. We determine a new measure, which we call pitching grade index (PGI) that efficiently summarizes a pitcher's performance on a numerical scale using principal components analysis. The PGI statistic can often be useful to assessing a pitcher's individual contribution. Also K-means clustering algorithm are used for segmentation of players into groups.

Analysis of Brokerage Commission Policy based on the Potential Customer Value (고객의 잠재가치에 기반한 증권사 수수료 정책 연구)

  • Shin, Hyung-Won;Sohn, So-Young
    • IE interfaces
    • /
    • v.16 no.spc
    • /
    • pp.123-126
    • /
    • 2003
  • In this paper, we use three cluster algorithms (K-means, Self-Organizing Map, and Fuzzy K-means) to find proper graded stock market brokerage commission rates based on the cumulative transactions on both stock exchange market and HTS (Home Trading System). Stock trading investors for both modes are classified in terms of the total transaction as well as the corresponding mode of investment, respectively. Empirical analysis results indicated that fuzzy K-means cluster analysis is the best fit for the segmentation of customers of both transaction modes in terms of robustness. We then propose the rules for three grouping of customers based on decision tree and apply different brokerage commission to be 0.4%, 0.45%, and 0.5% for exchange market while 0.06%, 0.1%, 0.18% for HTS.

Nucleus Recognition of Uterine Cervical Pap-Smears using FCM Clustering Algorithm

  • Kim, Kwang-Baek
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.1
    • /
    • pp.94-99
    • /
    • 2008
  • Segmentation for the region of nucleus in the image of uterine cervical cytodiagnosis is known as the most difficult and important part in the automatic cervical cancer recognition system. In this paper, the region of nucleus is extracted from an image of uterine cervical cytodiagnosis using the HSI model. The characteristics of the nucleus are extracted from the analysis of morphemetric features, densitometric features, colormetric features, and textural features based on the detected region of nucleus area. The classification criterion of a nucleus is defined according to the standard categories of the Bethesda system. The fuzzy C-means clustering algorithm is employed to the extracted nucleus and the results show that the proposed method is efficient in nucleus recognition and uterine cervical Pap-Smears extraction.

Probabilistic reduced K-means cluster analysis (확률적 reduced K-means 군집분석)

  • Lee, Seunghoon;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.905-922
    • /
    • 2021
  • Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.

An Empirical Comparison and Verification Study on the Containerports Clustering Measurement Using K-Means and Hierarchical Clustering(Average Linkage Method Using Cross-Efficiency Metrics, and Ward Method) and Mixed Models (K-Means 군집모형과 계층적 군집(교차효율성 메트릭스에 의한 평균연결법, Ward법)모형 및 혼합모형을 이용한 컨테이너항만의 클러스터링 측정에 대한 실증적 비교 및 검증에 관한 연구)

  • Park, Ro-Kyung
    • Journal of Korea Port Economic Association
    • /
    • v.34 no.3
    • /
    • pp.17-52
    • /
    • 2018
  • The purpose of this paper is to measure the clustering change and analyze empirical results. Additionally, by using k-means, hierarchical, and mixed models on Asian container ports over the period 2006-2015, the study aims to form a cluster comprising Busan, Incheon, and Gwangyang ports. The models consider the number of cranes, depth, birth length, and total area as inputs and container twenty-foot equivalent units(TEU) as output. Following are the main empirical results. First, ranking order according to the increasing ratio during the 10 years analysis shows that the value for average linkage(AL), mixed ward, rule of thumb(RT)& elbow, ward, and mixed AL are 42.04% up, 35.01% up, 30.47%up, and 23.65% up, respectively. Second, according to the RT and elbow models, the three Korean ports can be clustered with Asian ports in the following manner: Busan Port(Hong Kong, Guangzhou, Qingdao, and Singapore), Incheon Port(Tokyo, Nagoya, Osaka, Manila, and Bangkok), and Gwangyang Port(Gungzhou, Ningbo, Qingdao, and Kasiung). Third, optimal clustering numbers are as follows: AL(6), Mixed Ward(5), RT&elbow(4), Ward(5), and Mixed AL(6). Fourth, empirical clustering results match with those of questionnaire-Busan Port(80%), Incheon Port(17%), and Gwangyang Port(50%). The policy implication is that related parties of Korean seaports should introduce port improvement plans like the benchmarking of clustered seaports.

Development of Personalized Recommendation System using RFM method and k-means Clustering (RFM기법과 k-means 기법을 이용한 개인화 추천시스템의 개발)

  • Cho, Young-Sung;Gu, Mi-Sug;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.6
    • /
    • pp.163-172
    • /
    • 2012
  • Collaborative filtering which is used explicit method in a existing recommedation system, can not only reflect exact attributes of item but also still has the problem of sparsity and scalability, though it has been practically used to improve these defects. This paper proposes the personalized recommendation system using RFM method and k-means clustering in u-commerce which is required by real time accessablity and agility. In this paper, using a implicit method which is is not used complicated query processing of the request and the response for rating, it is necessary for us to keep the analysis of RFM method and k-means clustering to be able to reflect attributes of the item in order to find the items with high purchasablity. The proposed makes the task of clustering to apply the variable of featured vector for the customer's information and calculating of the preference by each item category based on purchase history data, is able to recommend the items with efficiency. To estimate the performance, the proposed system is compared with existing system. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset, collected in a cosmetic internet shopping mall.