• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 548, Processing Time 0.022 seconds

Optimal Arrangement of Patrol Ships based on k-Means Clustering for Quick Response of Marine Accidents (해양사고 신속대응을 위한 k-평균 군집화 기반 경비함정 최적배치)

  • Yoo, Sang-Lok;Jung, Cho-Young
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.23 no.7
    • /
    • pp.775-782
    • /
    • 2017
  • The position of existing patrol ships has been decided according to subjective judgments, not purely by any reasonable or scientific criteria, because of a lack of access to marine accident positions. In this study, the optimal location of patrol ships is quantitatively determined based on historical marine accident data. The study area used included the coastal sea of Pohang in South Korea. In this study, a k-means clustering algorithm was used to derive the location of patrol ships, and then a Voronoi diagram was used to divide the region around each patrol ship. As a result, the average navigation distance for patrol ships was improved by 4.4 nautical miles, and the average arrival time was improved by 13.2 minutes per marine accident. Moreover, if the locations of patrol ships need to be changed flexibly, it will be possible to optimally arrange limited resources using the technique developed in this study to ensure a fast rescue.

A Study on Efficient Memory Management Using Machine Learning Algorithm

  • Park, Beom-Joo;Kang, Min-Soo;Lee, Minho;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.6 no.1
    • /
    • pp.39-43
    • /
    • 2017
  • As the industry grows, the amount of data grows exponentially, and data analysis using these serves as a predictable solution. As data size increases and processing speed increases, it has begun to be applied to new fields by combining artificial intelligence technology as well as simple big data analysis. In this paper, we propose a method to quickly apply a machine learning based algorithm through efficient resource allocation. The proposed algorithm allocates memory for each attribute. Learning Distinct of Attribute and allocating the right memory. In order to compare the performance of the proposed algorithm, we compared it with the existing K-means algorithm. As a result of measuring the execution time, the speed was improved.

Determining the number of Clusters in On-Line Document Clustering Algorithm (온라인 문서 군집화에서 군집 수 결정 방법)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The KIPS Transactions:PartB
    • /
    • v.14B no.7
    • /
    • pp.513-522
    • /
    • 2007
  • Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls (군집분석 비교 및 한우 관능평가데이터 군집화)

  • Kim, Jae-Hee;Ko, Yoon-Sil
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.745-758
    • /
    • 2009
  • Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

A Study on improvement of sounding density of ENCs (전자해도 수심 밀집도 개선에 관한 연구)

  • Oh, Se-Woong;Park, Jong-Min;Suh, Sang-Hyun;Lee, Moon-Jin;Jeon, Tae-Byung
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2011.06a
    • /
    • pp.34-36
    • /
    • 2011
  • ENCs is edited based on the numerical charts for publishing paper charts and serviced in forms of grid styles. For this reason, the density of sounding information of ENCs is not consistent and was required for improvement. In this study, K-Means, ISODATA clustering algorithm as classification methods for satellite image was reviewed and adopted to case study. The developed results include loading module of ENC data, improvement algorithm of sounding information, writing module of ENC data. According to the results of algorithm, we could confirm the improved result.

  • PDF

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Cluster-based Deep One-Class Classification Model for Anomaly Detection

  • Younghwan Kim;Huy Kang Kim
    • Journal of Internet Technology
    • /
    • v.22 no.4
    • /
    • pp.903-911
    • /
    • 2021
  • As cyber-attacks on Cyber-Physical System (CPS) become more diverse and sophisticated, it is important to quickly detect malicious behaviors occurring in CPS. Since CPS can collect sensor data in near real time throughout the process, there have been many attempts to detect anomaly behavior through normal behavior learning from the perspective of data-driven security. However, since the CPS datasets are big data and most of the data are normal data, it has always been a great challenge to analyze the data and implement the anomaly detection model. In this paper, we propose and evaluate the Clustered Deep One-Class Classification (CD-OCC) model that combines the clustering algorithm and deep learning (DL) model using only a normal dataset for anomaly detection. We use auto-encoder to reduce the dimensions of the dataset and the K-means clustering algorithm to classify the normal data into the optimal cluster size. The DL model trains to predict clusters of normal data, and we can obtain logit values as outputs. The derived logit values are datasets that can better represent normal data in terms of knowledge distillation and are used as inputs to the OCC model. As a result of the experiment, the F1 score of the proposed model shows 0.93 and 0.83 in the SWaT and HAI dataset, respectively, and shows a significant performance improvement over other recent detectors such as Com-AE and SVM-RBF.

Machine-Part Grouping Formation Using Grid Computing (그리드 컴퓨팅을 이용한 기계-부품 그룹 형성)

  • Lee, Jong-Sub;Kang, Maing-Kyu
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.30 no.3
    • /
    • pp.175-180
    • /
    • 2004
  • The machine-part group formation is to group the sets of parts having similar processing requirements into part families, and the sets of machines needed to process a particular part family into machine cells using grid computing. It forms machine cells from the machine-part incidence matrix by means of Self-Organizing Maps(SOM) whose output layer is one-dimension and the number of output nodes is the twice as many as the number of input nodes in order to spread out the machine vectors. It generates machine-part group which are assigned to machine cells by means of the number of bottleneck machine with processing part. The proposed algorithm was tested on well-known machine-part grouping problems. The results of this computational study demonstrate the superiority of the proposed algorithm.

Unsupervised Learning Model for Fault Prediction Using Representative Clustering Algorithms (대표적인 클러스터링 알고리즘을 사용한 비감독형 결함 예측 모델)

  • Hong, Euyseok;Park, Mikyeong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.2
    • /
    • pp.57-64
    • /
    • 2014
  • Most previous studies of software fault prediction model which determines the fault-proneness of input modules have focused on supervised learning model using training data set. However, Unsupervised learning model is needed in case supervised learning model cannot be applied: either past training data set is not present or even though there exists data set, current project type is changed. Building an unsupervised learning model is extremely difficult that is why only a few studies exist. In this paper, we build unsupervised models using representative clustering algorithms, EM and DBSCAN, that have not been used in prior studies and compare these models with the previous model using K-means algorithm. The results of our study show that the EM model performs slightly better than the K-means model in terms of error rate and these two models significantly outperform the DBSCAN model.

A Study on Three Phase Partial Discharge Pattern Classification with the Aid of Optimized Polynomial Radial Basis Function Neural Networks (최적화된 pRBF 뉴럴 네트워크에 이용한 삼상 부분방전 패턴분류에 관한 연구)

  • Oh, Sung-Kwun;Kim, Hyun-Ki;Kim, Jung-Tae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.4
    • /
    • pp.544-553
    • /
    • 2013
  • In this paper, we propose the pattern classifier of Radial Basis Function Neural Networks(RBFNNs) for diagnosis of 3-phase partial discharge. Conventional methods map the partial discharge/noise data on 3-PARD map, and decide whether the partial discharge occurs or not from 3-phase or neutral point. However, it is decided based on his own subjective knowledge of skilled experter. In order to solve these problems, the mapping of data as well as the classification of phases are considered by using the general 3-PARD map and PA method, and the identification of phases occurring partial discharge/noise discharge is done. In the sequel, the type of partial discharge occurring on arbitrary random phase is classified and identified by fuzzy clustering-based polynomial Radial Basis Function Neural Networks(RBFNN) classifier. And by identifying the learning rate, momentum coefficient, and fuzzification coefficient of FCM fuzzy clustering with the aid of PSO algorithm, the RBFNN classifier is optimized. The virtual simulated data and the experimental data acquired from practical field are used for performance estimation of 3-phase partial discharge pattern classifier.