• Title/Summary/Keyword: K-means++ algorithm

Search Result 1,367, Processing Time 0.036 seconds

Analysis of Academic Achievement Data Using AI Cluster Algorithms (AI 군집 알고리즘을 활용한 학업 성취도 데이터 분석)

  • Koo, Dukhoi;Jung, Soyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.6
    • /
    • pp.1005-1013
    • /
    • 2021
  • With the prolonged COVID-19, the existing academic gap is widening. The purpose of this study is to provide homeroom teachers with a visual confirmation of the academic achievement gap in grades and classrooms through academic achievement analysis, and to use this to help them design lessons and explore ways to improve the academic achievement gap. The data of students' Korean and math diagnostic evaluation scores at the beginning of the school year were visualized as clusters using the K-means algorithm, and as a result, it was confirmed that a meaningful clusters were formed. In addition, through the results of the teacher interview, it was confirmed that this system was meaningful in improving the academic achievement gap, such as checking the learning level and academic achievement of students, and designing classes such as individual supplementary instruction and level-specific learning. This means that this academic achievement data analysis system helps to improve the academic gap. This study provides practical help to homeroom teachers in exploring ways to improve the academic gap in grades and classes, and is expected to ultimately contribute to improving the academic gap.

An Implementation of Security System Using Speaker Recognition Algorithm (화자인식 알고리즘을 이용한 보안 시스템 구축)

  • Shin, You-Shik;Park, Kee-Young;Kim, Chong-Kyo
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.4
    • /
    • pp.17-23
    • /
    • 1999
  • This paper described a security system using text-independent speaker recognition algorithm. Security system is based on PIC16F84 and sound card. Speaker recognition algorithm applied a k-means based model and weighted cepstrum for speech features. As the experimental results, recognition rate of the training data is 100%, non-training data is 99%. Also false rejection rate is 1%, false acceptance rate is 0% and verification mean error rate is 0.5% for registered 5 persons.

  • PDF

Classification of Fuzzy Logic on the Optimized Bead Geometry in the Gas Metal Arc Welding

  • Yu Xue;Kim, Ill-Soo;Park, Chang-Eun;Kim, In-Ju;Son, Joon-Sik
    • Proceedings of the Korean Society of Machine Tool Engineers Conference
    • /
    • 2004.10a
    • /
    • pp.225-232
    • /
    • 2004
  • Recently, there has been a rapid development in computer technology, which has in turn led to develop the automated welding system using Artificial Intelligence (AI). However, the automated welding system has not been achieved duo to difficulties of the control and sensor technologies. In this paper, the classification of the optimized bead geometry such as bead width, height penetration and bead area in the Gas Metal Arc (GMA) welding with fuzzy logic is presented. The fuzzy C-Means algorithm (FCM), which is best known an unsupervised fuzzy clustering algorithm is employed here to analysis the specimen of the bead geometry. Then the quality of the GMA welding can be classified by this fuzzy clustering technique and the choice for obtaining the optimal bead geometry can also be determined.

  • PDF

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.

An Efficient Clustering Algorithm based on Heuristic Evolution (휴리스틱 진화에 기반한 효율적 클러스터링 알고리즘)

  • Ryu, Joung-Woo;Kang, Myung-Ku;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.80-90
    • /
    • 2002
  • Clustering is a useful technique for grouping data points such that points within a single group/cluster have similar characteristics. Many clustering algorithms have been developed and used in engineering applications including pattern recognition and image processing etc. Recently, it has drawn increasing attention as one of important techniques in data mining. However, clustering algorithms such as K-means and Fuzzy C-means suffer from difficulties. Those are the needs to determine the number of clusters apriori and the clustering results depending on the initial set of clusters which fails to gain desirable results. In this paper, we propose a new clustering algorithm, which solves mentioned problems. In our method we use evolutionary algorithm to solve the local optima problem that clustering converges to an undesirable state starting with an inappropriate set of clusters. We also adopt a new measure that represents how well data are clustered. The measure is determined in terms of both intra-cluster dispersion and inter-cluster separability. Using the measure, in our method the number of clusters is automatically determined as the result of optimization process. And also, we combine heuristic that is problem-specific knowledge with a evolutionary algorithm to speed evolutionary algorithm search. We have experimented our algorithm with several sets of multi-dimensional data and it has been shown that one algorithm outperforms the existing algorithms.

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.2
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.

Vector Quantization using Genetic Algorithm (유전자 알고리즘을 이용한 벡터 양자화)

  • 임현택
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.197-200
    • /
    • 1998
  • 본 논문에서는 유전자 알고리즘(genetic Algorithm)을 사용하여 벡터 양자화(vector quantization : VQ)를 수행하는 방법을 제안하고자 한다. 벡터 양자화를 수행하여 코드북(codebook)을 생성할 때 생성된 코드북과 학습벡터(training vector)사이에는 반드시 양자화 오차(quantization error)가 발생하는데 기존의 K-means 알고리듬을 사용하여 코드북을 생성했을 경우 양자화 오차를 줄이는데 한계가 있었다. 본 논문에서 제안하는 유전자 알고리즘을 이용한 벡터 양자화는 이 양자화 오차를 감소시키기 위해서 연구되었다. 제안한 방법의 성능을 평가하기 위해 음성데이터를 기존의 K-means 알고리즘에서 클러스터의 중심을 선택하는 방법중의 하나인 Minimax방법으로 코드북을 생성하여 제안한 방법과 양자화 오차를 비교한 결과 양자화 오차가 감소됨을 알 수 있었다.

  • PDF

An optimal feature selection algorithm for the network intrusion detection system (네트워크 침입 탐지를 위한 최적 특징 선택 알고리즘)

  • Jung, Seung-Hyun;Moon, Jun-Geol;Kang, Seung-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.342-345
    • /
    • 2014
  • Network intrusion detection system based on machine learning methods is quite dependent on the selected features in terms of accuracy and efficiency. Nevertheless, choosing the optimal combination of features from generally used features to detect network intrusion requires extensive computing resources. For instance, the number of possible feature combinations from given n features is $2^n-1$. In this paper, to tackle this problem we propose a optimal feature selection algorithm. Proposed algorithm is based on the local search algorithm, one of representative meta-heuristic algorithm for solving optimization problem. In addition, the accuracy of clusters which obtained using selected feature components and k-means clustering algorithm is adopted to evaluate a feature assembly. In order to estimate the performance of our proposed algorithm, comparing with a method where all features are used on NSL-KDD data set and multi-layer perceptron.

  • PDF

A Study on Nucleus Recognition of Uterine Cervical Pap-Smears using Fuzzy c-Means Clustering Algorithm (퍼지 c-Means 클러스터링 알고리즘을 이용한 자궁 세포진 핵 인식에 관한 연구)

  • Heo, Jung-Min;Kim, Jung-Min;Kim, Kwang-Baek
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.403-407
    • /
    • 2005
  • 자궁 경부 세포진 영상의 핵 영역 분할은 자궁 경부암 자동화 검색 시스템의 가장 어렵고도 중요한 분야로 알려져 있다. 본 논문에서는 자궁 경부 세포진 영상에서 HSI 모델을 이용하여 세포진 핵 영역을 추출한다. 추출된 세포진 핵 영역은 형태학적 정보(morphometric feature)와 명암 정보(densitometric feature), 색상 정보(colorimetric feature), 질감 정보(textural features)를 분석하여 핵의 특징을 추출한다. 또한 Bethesda System에서의 분류 기준에 따라 핵의 분류 기준을 정하고 추출된 핵의 특징들을 퍼지 c-Means 클러스터링 알고리즘에 적용하여 실험한 결과, 제안된 방법이 자궁 세포진 핵 추출과 인식에 있어서 효율적임을 확인하였다.

  • PDF

A Machine Learning Program for Impact Fracture Analysis (머신러닝을 이용한 충격파면 해석에 관한 연구)

  • Lee, Seung-Jin;Kim, Gi-Man;Choi, Seong-Dae
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.20 no.1
    • /
    • pp.95-102
    • /
    • 2021
  • Analysis of the fracture surface is one of the most important methods for determining the cause of equipment structural failure. Whether structural failure is caused by impact or fatigue is necessary information in industrial fields. For ferrous and non-ferrous metal materials, two fracture phenomena are generated on the fracture surface: ductile and brittle fractures. In this study, machine learning predicts whether the fracture is based on ductile or brittle when structurural failure is caused by impact. The K-means algorithm calculates this ratio by clustering the brittle and ductile fracture data from a photograph of the impact fracture surface, unlike the existing method, which calculates the fracture surface ratio by comparison with the grid type or the reference fracture surface shape.