• Title/Summary/Keyword: K-means++ algorithm

Search Result 1,363, Processing Time 0.035 seconds

A Fuzzy Clustering Method based on Genetic Algorithm

  • Jo, Jung-Bok;Do, Kyeong-Hoon;Linhu Zhao;Mitsuo Gen
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.1025-1028
    • /
    • 2000
  • In this paper, we apply to a genetic algorithm for fuzzy clustering. We propose initialization procedure and genetic operators such as selection, crossover and mutation, which are suitable for solving the problems. To illustrate the effectiveness of the proposed algorithm, we solve the manufacturing cell formation problem and present computational comparisons to generalized Fuzzy c-Means algorithm.

  • PDF

Improved TI-FCM Clustering Algorithm in Big Data (빅데이터에서 개선된 TI-FCM 클러스터링 알고리즘)

  • Lee, Kwang-Kyug
    • Journal of IKEEE
    • /
    • v.23 no.2
    • /
    • pp.419-424
    • /
    • 2019
  • The FCM algorithm finds the optimal solution through iterative optimization technique. In particular, there is a difference in execution time depending on the initial center of clustering, the location of noise, the location and number of crowded densities. However, this method gradually updates the center point, and the center of the initial cluster is shifted to one side. In this paper, we propose a TI-FCM(Triangular Inequality-Fuzzy C-Means) clustering algorithm that determines the cluster center density by maximizing the distance between clusters using triangular inequality. The proposed method is an effective method to converge to real clusters compared to FCM even in large data sets. Experiments show that execution time is reduced compared to existing FCM.

An Efficient K-means Clustering Algorithm using Prediction (예측을 이용한 효율적인 K-Means 알고리즘)

  • Tae-Chang Jee;Hyunjin Lee;Yillbyung Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.3-4
    • /
    • 2008
  • 본 논문에서 k-means 군집화 알고리즘을 효율적으로 적용하는 방법을 제안했다. 제안하는 알고리즘의 특징을 속도 향상을 위해 예측 데이터를 이용한 것이다. 군집화 알고리즘의 각 단계에서 군집을 변경할 데이터만 최인접 군집을 계산함으로써 계산 시간을 줄일 수 있었다. 제안하는 알고리즘의 성능 비교를 위해서 KMHybrid 와 비교했다. 제안하는 알고리즘은 데이터의 차원이 큰 경우에 KMHybrid 보다 높은 속도 향상을 보였다.

Creation of Frequent Patterns using K-means Algorithm for Data Mining Preprocess (데이터 마이닝의 전처리를 위한 K-means 알고리즘을 이용한 빈발패턴 생성)

  • Heui-Jong Yoo;Chi-Yeon Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.336-339
    • /
    • 2008
  • 우리가 사용하는 데이터베이스 내에는 많은 양의 데이터 들이 들어 있으며, 계속적으로 그 양은 늘어나고 있다. 이러한 데이터들로부터 질의를 통해 얻을 수 있는 기본적이고 단순한 정보들과 달리 고급 정보를 얻게 해주는 방법이 데이터 마이닝이다. 데이터 마이닝의 기법 중에서 본 논문에서는 k-means 알고리즘을 사용하여 트랜잭션을 클러스터링 함으로써 데이터베이스의 트랜잭션 수를 줄여 연관규칙의 대표적인 알고리즘인 Apriori 알고리즘의 단점인 트랜잭션 스캔으로 인한 성능 저하를 개선하고자 한다.

Efficient Node Deployment Algorithm for Sequence-Based Localization (SBL) Systems (시퀀스 기반 위치추정 시스템을 위한 효율적 노드배치 알고리즘)

  • Park, Hyun Hong;Kim, Yoon Hak
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.658-663
    • /
    • 2018
  • In this paper, we consider node deployment algorithms for the sequence-based localization (SBL) which is recently employed for in-door positioning systems, Whereas previous node selection or deployment algorithms seek to place nodes at centrold of the region where more targets are likely to be found, we observe that the boundaries dividing such regions can be good locations for the nodes in SBL systems. Motivated by this observation, we propose an efficient node deployment algorithm that determines the boundaries by using the well-known K-means algorithm and find the potential node locations based on the bi-section method for low-complexity design. We demonstrate through experiments that the proposed algorithm achieves significant localization performance over random node allocation with a substantially reduced complexity as compared with a full search.

Classification of basin characteristics related to inundation using clustering (군집분석을 이용한 침수관련 유역특성 분류)

  • Lee, Han Seung;Cho, Jae Woong;Kang, Ho seon;Hwang, Jeong Geun;Moon, Hae Jin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.96-96
    • /
    • 2020
  • In order to establish the risk criteria of inundation due to typhoons or heavy rainfall, research is underway to predict the limit rainfall using basin characteristics, limit rainfall and artificial intelligence algorithms. In order to improve the model performance in estimating the limit rainfall, the learning data are used after the pre-processing. When 50.0% of the entire data was removed as an outlier in the pre-processing process, it was confirmed that the accuracy is over 90%. However, the use rate of learning data is very low, so there is a limitation that various characteristics cannot be considered. Accordingly, in order to predict the limit rainfall reflecting various watershed characteristics by increasing the use rate of learning data, the watersheds with similar characteristics were clustered. The algorithms used for clustering are K-Means, Agglomerative, DBSCAN and Spectral Clustering. The k-Means, DBSCAN and Agglomerative clustering algorithms are clustered at the impervious area ratio, and the Spectral clustering algorithm is clustered in various forms depending on the parameters. If the results of the clustering algorithm are applied to the limit rainfall prediction algorithm, various watershed characteristics will be considered, and at the same time, the performance of predicting the limit rainfall will be improved.

  • PDF

Parallel Processing of K-means Clustering Algorithm for Unsupervised Classification of Large Satellite Imagery (대용량 위성영상의 무감독 분류를 위한 K-means 군집화 알고리즘의 병렬처리)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.3
    • /
    • pp.187-194
    • /
    • 2017
  • The present study introduces a method to parallelize k-means clustering algorithm for fast unsupervised classification of large satellite imagery. Known as a representative algorithm for unsupervised classification, k-means clustering is usually applied to a preprocessing step before supervised classification, but can show the evident advantages of parallel processing due to its high computational intensity and less human intervention. Parallel processing codes are developed by using multi-threading based on OpenMP. In experiments, a PC of 8 multi-core integrated CPU is involved. A 7 band and 30m resolution image from LANDSAT 8 OLI and a 8 band and 10m resolution image from Sentinel-2A are tested. Parallel processing has shown 6 time faster speed than sequential processing when using 10 classes. To check the consistency of parallel and sequential processing, centers, numbers of classified pixels of classes, classified images are mutually compared, resulting in the same results. The present study is meaningful because it has proved that performance of large satellite processing can be significantly improved by using parallel processing. And it is also revealed that it easy to implement parallel processing by using multi-threading based on OpenMP but it should be carefully designed to control the occurrence of false sharing.

A Design of Fuzzy Classifier with Hierarchical Structure (계층적 구조를 가진 퍼지 패턴 분류기 설계)

  • Ahn, Tae-Chon;Roh, Seok-Beom;Kim, Yong Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.4
    • /
    • pp.355-359
    • /
    • 2014
  • In this paper, we proposed the new fuzzy pattern classifier which combines several fuzzy models with simple consequent parts hierarchically. The basic component of the proposed fuzzy pattern classifier with hierarchical structure is a fuzzy model with simple consequent part so that the complexity of the proposed fuzzy pattern classifier is not high. In order to analyze and divide the input space, we use Fuzzy C-Means clustering algorithm. In addition, we exploit Conditional Fuzzy C-Means clustering algorithm to analyze the sub space which is divided by Fuzzy C-Means clustering algorithm. At each clustered region, we apply a fuzzy model with simple consequent part and build the fuzzy pattern classifier with hierarchical structure. Because of the hierarchical structure of the proposed pattern classifier, the data distribution of the input space can be analyzed in the macroscopic point of view and the microscopic point of view. Finally, in order to evaluate the classification ability of the proposed pattern classifier, the machine learning data sets are used.

Real-Time Non-Local Means Image Denoising Algorithm Based on Local Binary Descriptor

  • Yu, Hancheng;Li, Aiting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.2
    • /
    • pp.825-836
    • /
    • 2016
  • In this paper, a speed-up technique for the non-local means (NLM) image denoising method based on local binary descriptor (LBD) is proposed. In the NLM, most of the computation time is spent on searching for non-local similar patches in the search window. The local binary descriptor which represents the structure of patch as binary strings is employed to speed up the search process in the NLM. The descriptor allows for a fast and accurate preselection of non-local similar patches by bitwise operations. Using this approach, a tradeoff between time-saving and noise removal can be obtained. Simulations exhibit that despite being principally constructed for speed, the proposed algorithm outperforms in terms of denoising quality as well. Furthermore, a parallel implementation on GPU brings NLM-LBD to real-time image denoising.

Analysis of spatial mixing characteristics of water quality at the confluence using artificial intelligence (인공지능을 활용한 합류부에서 수질의 공간혼합 특성 분석)

  • Lee, Seo Gyeong;Kim, Dongsu;Kim, Kyungdong;Kim, Young Do;Lyu, Siwan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.482-482
    • /
    • 2022
  • 하천의 합류부에서는 수질이 다른 유체가 혼합하여 합류 전과 다른 특성을 보인다. 하천의 합류부에서 수질을 효율적으로 관리하기 위해서는 수질의 공간적인 혼합 특성을 규명하는 것이 중요하다. 합류부에서 수질의 공간적인 혼합 특성을 분석하기 위해 본 연구에서는 토폴로지 데이터 분석(topological data analysis, TDA), 자기 조직화 지도(Self-Organizing Map, SOM), k-평균 알고리즘(K-means clustering algorithm) 세 가지 기법을 이용하였다. 세 가지 기법을 비교하여 어떤 알고리즘이 합류부의 수질 변화 특성을 더 뚜렷하게 나타내는지 분석하였다. 수질 변화 비교 인자들은 pH, chlorophyll, DO, Turbidity 등이 있고, 수질 인자들은 YSI를 활용해 측정하였다. 자료의 측정 지역은 낙동강과 황강이 합류하는 지역이며, 보트에 YSI 장비를 부착하고 횡단하여 측정하였다. 측정한 데이터를 R 프로그램을 통해 세 가지 기법을 적용시켜 수질 변화 비교를 분석한다. 토폴로지 데이터 분석(topological data analysis, TDA)은 거대하고 복잡한 데이터로부터 유의미한 정보를 추출하는 데 사용하고, 자기조직화지도(Self-Organizing Map, SOM) 기법은 차원 축소와 군집화를 동시에 수행한다. k-평균 알고리즘(K-means clustering algorithm) 기법은 주어진 데이터를 k개의 클러스터로 묶는 머신러닝 비지도학습에 속하는 알고리즘이다. 세 가지 방법들의 주목적은 클러스터링이다. 클러스터 분석(Cluster analysis)이란 주어진 데이터들의 특성을 고려해 동일한 성격을 가진 여러 개의 그룹으로 대상을 분류하는 데이터 마이닝의 한 방법이다. 군집화 방법들인 TDA, SOM, K-means를 이용해 합류 지역의 수질 특성들을 클러스터링하여 수질 패턴들을 분석해 하천 수질 오염을 방지할 수 있을 것이다. 본 연구에서는 토폴로지 데이터 분석(topological data analysis, TDA), 자기조직화지도(Self-Organizing Map, SOM), k-평균 알고리즘(K-means clustering algorithm) 세 가지 기법을 이용하여 합류부에서의 수질 특성을 비교하며 어떤 기법이 합류의 특성을 더욱 뚜렷하게 나타내는지 규명했다. 합류의 특성을 군집화 방법을 이용해 알게 된다면, 합류부의 수질 변화 패턴을 다른 합류 지역에서도 적용할 수 있을 것으로 기대된다.

  • PDF