• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 548, Processing Time 0.022 seconds

Classification of basin characteristics related to inundation using clustering (군집분석을 이용한 침수관련 유역특성 분류)

  • Lee, Han Seung;Cho, Jae Woong;Kang, Ho seon;Hwang, Jeong Geun;Moon, Hae Jin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.96-96
    • /
    • 2020
  • In order to establish the risk criteria of inundation due to typhoons or heavy rainfall, research is underway to predict the limit rainfall using basin characteristics, limit rainfall and artificial intelligence algorithms. In order to improve the model performance in estimating the limit rainfall, the learning data are used after the pre-processing. When 50.0% of the entire data was removed as an outlier in the pre-processing process, it was confirmed that the accuracy is over 90%. However, the use rate of learning data is very low, so there is a limitation that various characteristics cannot be considered. Accordingly, in order to predict the limit rainfall reflecting various watershed characteristics by increasing the use rate of learning data, the watersheds with similar characteristics were clustered. The algorithms used for clustering are K-Means, Agglomerative, DBSCAN and Spectral Clustering. The k-Means, DBSCAN and Agglomerative clustering algorithms are clustered at the impervious area ratio, and the Spectral clustering algorithm is clustered in various forms depending on the parameters. If the results of the clustering algorithm are applied to the limit rainfall prediction algorithm, various watershed characteristics will be considered, and at the same time, the performance of predicting the limit rainfall will be improved.

  • PDF

Incremental Clustering Algorithm by Modulating Vigilance Parameter Dynamically (경계변수 값의 동적인 변경을 이용한 점층적 클러스터링 알고리즘)

  • 신광철;한상용
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1072-1079
    • /
    • 2003
  • This study is purported for suggesting a new clustering algorithm that enables incremental categorization of numerous documents. The suggested algorithm adopts the natures of the spherical k-means algorithm, which clusters a mass amount of high-dimensional documents, and the fuzzy ART(adaptive resonance theory) neural network, which performs clustering incrementally. In short, the suggested algorithm is a combination of the spherical k-means vector space model and concept vector and fuzzy ART vigilance parameter. The new algorithm not only supports incremental clustering and automatically sets the appropriate number of clusters, but also solves the current problems of overfitting caused by outlier and noise. Additionally, concerning the objective function value, which measures the cluster's coherence that is used to evaluate the quality of produced clusters, tests on the CLASSIC3 data set showed that the newly suggested algorithm works better than the spherical k-means by 8.04% in average.

K-Means-Based Polynomial-Radial Basis Function Neural Network Using Space Search Algorithm: Design and Comparative Studies (공간 탐색 최적화 알고리즘을 이용한 K-Means 클러스터링 기반 다항식 방사형 기저 함수 신경회로망: 설계 및 비교 해석)

  • Kim, Wook-Dong;Oh, Sung-Kwun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.17 no.8
    • /
    • pp.731-738
    • /
    • 2011
  • In this paper, we introduce an advanced architecture of K-Means clustering-based polynomial Radial Basis Function Neural Networks (p-RBFNNs) designed with the aid of SSOA (Space Search Optimization Algorithm) and develop a comprehensive design methodology supporting their construction. In order to design the optimized p-RBFNNs, a center value of each receptive field is determined by running the K-Means clustering algorithm and then the center value and the width of the corresponding receptive field are optimized through SSOA. The connections (weights) of the proposed p-RBFNNs are of functional character and are realized by considering three types of polynomials. In addition, a WLSE (Weighted Least Square Estimation) is used to estimate the coefficients of polynomials (serving as functional connections of the network) of each node from output node. Therefore, a local learning capability and an interpretability of the proposed model are improved. The proposed model is illustrated with the use of nonlinear function, NOx called Machine Learning dataset. A comparative analysis reveals that the proposed model exhibits higher accuracy and superb predictive capability in comparison to some previous models available in the literature.

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

A Method of Detecting the Aggressive Driving of Elderly Driver (노인 운전자의 공격적인 운전 상태 검출 기법)

  • Koh, Dong-Woo;Kang, Hang-Bong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.537-542
    • /
    • 2017
  • Aggressive driving is a major cause of car accidents. Previous studies have mainly analyzed young driver's aggressive driving tendency, yet they were only done through pure clustering or classification technique of machine learning. However, since elderly people have different driving habits due to their fragile physical conditions, it is necessary to develop a new method such as enhancing the characteristics of driving data to properly analyze aggressive driving of elderly drivers. In this study, acceleration data collected from a smartphone of a driving vehicle is analyzed by a newly proposed ECA(Enhanced Clustering method for Acceleration data) technique, coupled with a conventional clustering technique (K-means Clustering, Expectation-maximization algorithm). ECA selects high-intensity data among the data of the cluster group detected through K-means and EM in all of the subjects' data and models the characteristic data through the scaled value. Using this method, the aggressive driving data of all youth and elderly experiment participants were collected, unlike the pure clustering method. We further found that the K-means clustering has higher detection efficiency than EM method. Also, the results of K-means clustering demonstrate that a young driver has a driving strength 1.29 times higher than that of an elderly driver. In conclusion, the proposed method of our research is able to detect aggressive driving maneuvers from data of the elderly having low operating intensity. The proposed method is able to construct a customized safe driving system for the elderly driver. In the future, it will be possible to detect abnormal driving conditions and to use the collected data for early warning to drivers.

Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm (K-means 알고리즘을 이용한 계층적 클러스터링에서의 클러스터 계층 깊이 선택)

  • Lee, Won-Hee;Lee, Shin-Won;Chung, Sung-Jong;An, Dong-Un
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.150-156
    • /
    • 2008
  • Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means reduces a time complexity. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

Combined Artificial Bee Colony for Data Clustering (융합 인공벌군집 데이터 클러스터링 방법)

  • Kang, Bum-Su;Kim, Sung-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • Data clustering is one of the most difficult and challenging problems and can be formally considered as a particular kind of NP-hard grouping problems. The K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, it has high possibility to trap in local optimum and high variation of solutions with different initials for the large data set. Therefore, we need study efficient computational intelligence method to find the global optimal solution in data clustering problem within limited computational time. The objective of this paper is to propose a combined artificial bee colony (CABC) with K-means for initialization and finalization to find optimal solution that is effective on data clustering optimization problem. The artificial bee colony (ABC) is an algorithm motivated by the intelligent behavior exhibited by honeybees when searching for food. The performance of ABC is better than or similar to other population-based algorithms with the added advantage of employing fewer control parameters. Our proposed CABC method is able to provide near optimal solution within reasonable time to balance the converged and diversified searches. In this paper, the experiment and analysis of clustering problems demonstrate that CABC is a competitive approach comparing to previous partitioning approaches in satisfactory results with respect to solution quality. We validate the performance of CABC using Iris, Wine, Glass, Vowel, and Cloud UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KABCK (K-means+ABC+K-means) is better than ABCK (ABC+K-means), KABC (K-means+ABC), ABC, and K-means in our simulations.

Zone Clustering Using a Genetic Algorithm and K-Means (유전자 알고리듬과 K-평균법을 이용한 지역 분할)

  • 임동순;오현승
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.23 no.1
    • /
    • pp.1-16
    • /
    • 1998
  • The zone clustering problem arising from several area such as deciding the optimal location of ambient measuring stations is to devide the 2-dimensional area into several sub areas in which included individual zone shows simimlar properties. In general, the optimal solution of this problem is very hard to obtain. Therefore, instead of finding an optimal solution, the generation of near optimal solution within the limited time is more meaningful. In this study, the combination of a genetic algorithm and the modified k-means method is used to obtain the near optimal solution. To exploit the genetic algorithm effectively, a representation of chromsomes and appropriate genetic operators are proposed. The k-means method which is originally devised to solve the object clustering problem is modified to improve the solutions obtained from the genetic algorithm. The experiment shows that the proposed method generates the near optimal solution efficiently.

  • PDF

A Study on Web-User Clustering Algorithm for Web Personalization (웹 개인화를 위한 웹사용자 클러스터링 알고리즘에 관한 연구)

  • Lee, Hae-Kag
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.5
    • /
    • pp.2375-2382
    • /
    • 2011
  • The user clustering for web navigation pattern discovery is very useful to get preference and behavior pattern of users for web pages. In addition, the information by the user clustering is very essential for web personalization or customer grouping. In this paper, an algorithm for clustering the web navigation path of users is proposed and then some special navigation patterns can be recognized by the algorithm. The proposed algorithm has two clustering phases. In the first phase, all paths are classified into k-groups on the bases of the their similarities. The initial solution obtained in the first phase is not global optimum but it gives a good and feasible initial solution for the second phase. In the second phase, the first phase solution is improved by revising the k-means algorithm. In the revised K-means algorithm, grouping the paths is performed by the hyperplane instead of the distance between a path and a group center. Experimental results show that the proposed method is more efficient.

A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method (클러스터링 성능평가: 신경망 및 통계적 방법)

  • 윤석환;신용백
    • Journal of the Korean Professional Engineers Association
    • /
    • v.29 no.2
    • /
    • pp.71-79
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Loaming vector Quantization) for a neural method and the k -means algorithm for a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k -means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF