• Title/Summary/Keyword: Analysis of means

Search Result 9,972, Processing Time 0.043 seconds

Analysis Process based on Modify K-means for Efficiency Improvement of Electric Power Data Pattern Detection (전력데이터 패턴 추출의 효율성 향상을 위한 변형된 K-means 기반의 분석 프로세스)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Yong Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.12
    • /
    • pp.1960-1969
    • /
    • 2017
  • There have been ongoing researches to identify and analyze the patterns of electric power IoT data inside sensor nodes to supplement the stable supply of power and the efficiency of energy consumption. This study set out to propose an analysis process for electric power IoT data with the K-means algorithm, which is an unsupervised learning technique rather than a supervised one. There are a couple of problems with the old K-means algorithm, and one of them is the selection of cluster number K in a heuristic or random method. That approach is proper for the age of standardized data. The investigator proposed an analysis process of selecting an automated cluster number K through principal component analysis and the space division of normal distribution and incorporated it into electric power IoT data. The performance evaluation results show that it recorded a higher level of performance than the old algorithm in the cluster classification and analysis of pitches and rolls included in the communication bodies of utility poles.

Extraction of Blood Flow of Brachial Artery on Color Doppler Ultrasonography by Using 4-Directional Contour Tracking and K-Means Algorithm (4 방향 윤곽선 추적과 K-Means 알고리즘을 이용한 색조 도플러 초음파 영상에서 상환 동맥의 혈류 영역 추출)

  • Park, Joonsung;Kim, Kwang Baek
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.11
    • /
    • pp.1411-1416
    • /
    • 2020
  • In this paper, we propose a method of extraction analysis of blood flow area on color doppler ultrasonography by using 4-directional contour tracking and K-Means algorithm. In the proposed method, ROI is extracted and a binarization method with maximum contrast as a threshold is applied to the extracted ROI. 4-directional contour algorithm is applied to extract the trapezoid shaped region which has blood flow area of brachial artery from the binarized ROI. K-Means based quantization is then applied to accurately extract the blood flow area of brachial artery from the trapezoid shaped region. In experiment, the proposed method successfully extracts the target area in 28 out of 30 cases (93.3%) with field expert's verification. And comparison analysis of proposed K-Means based blood flow area extraction on 30 color doppler ultrasonography and brachial artery blood flow ultrasonography provided by a specialist yielded a result of 94.27% accuracy on average.

Design and Implementation of Distributed In-Memory DBMS-based Parallel K-Means as In-database Analytics Function (분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현)

  • Kou, Heymo;Nam, Changmin;Lee, Woohyun;Lee, Yongjae;Kim, HyoungJoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2018
  • As data size increase, a single database is not enough to serve current volume of tasks. Since data is partitioned and stored into multiple databases, analysis should also support parallelism in order to increase efficiency. However, traditional analysis requires data to be transferred out of database into nodes where analytic service is performed and user is required to know both database and analytic framework. In this paper, we propose an efficient way to perform K-means clustering algorithm inside the distributed column-based database and relational database. We also suggest an efficient way to optimize K-means algorithm within relational database.

A Variable Selection Procedure for K-Means Clustering

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.471-483
    • /
    • 2012
  • One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

Bayesian Model Selection for Inverse Gaussian Populations with Heterogeneity

  • Kang, Sang-Gil;Kim, Dal-Ho;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.2
    • /
    • pp.621-634
    • /
    • 2008
  • This paper addresses the problem of testing whether the means in several inverse Gaussian populations with heterogeneity are equal. The analysis of reciprocals for the equality of inverse Gaussian means needs the assumption of equal scale parameters. We propose Bayesian model selection procedures for testing equality of the inverse Gaussian means under the noninformative prior without the assumption of equal scale parameters. The noninformative prior is usually improper which yields a calibration problem that makes the Bayes factor to be defined up to a multiplicative constant. So we propose the objective Bayesian model selection procedures based on the fractional Bayes factor and the intrinsic Bayes factor under the reference prior. Simulation study and real data analysis are provided.

  • PDF

Partial Discharge Distribution Analysis on Interlace Defects of Cable Joint using K-means Clustering (K-means 클러스터링을 이용한 케이블 접속재 계면결함의 부분방전 분포 해석)

  • Cho, Kyung-Soon;Hong, Jin-Woong
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.20 no.11
    • /
    • pp.959-964
    • /
    • 2007
  • To investigate the influence of partial discharge(PD) distribution characteristics due to various defects on the power cable joints interface, we used the K-means clustering method. As the result of PD number(n) distribution analyzing on $\Phi-n$ graph, the phase angle($\Phi$) of cluster centroid shifted to $0^{\circ}\;and\;180^{\circ}$ increasing with applying voltage. It was confirmed that the PD quantify(q) and euclidean distance of centroid were increased with applying voltage from the centroid distribution analyzing of $\Phi-q$ plane. The dispersion degree was increased with calculated standard deviation of the $\Phi-q$ cluster centroid. The PD number and mean value on $\Phi-q$ graph were some different by electric field concentration with defect types.

Design of Pattern Classification Rule based on Local Linear Discriminant Analysis Classifier by using Differential Evolutionary Algorithm (차분진화 알고리즘을 이용한 지역 Linear Discriminant Analysis Classifier 기반 패턴 분류 규칙 설계)

  • Roh, Seok-Beom;Hwang, Eun-Jin;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.1
    • /
    • pp.81-86
    • /
    • 2012
  • In this paper, we proposed a new design methodology of a pattern classification rule based on the local linear discriminant analysis expanded from the generic linear discriminant analysis which is used in the local area divided from the whole input space. There are two ways such as k-Means clustering method and the differential evolutionary algorithm to partition the whole input space into the several local areas. K-Means clustering method is the one of the unsupervised clustering methods and the differential evolutionary algorithm is the one of the optimization algorithms. In addition, the experimental application covers a comparative analysis including several previously commonly encountered methods.

Combined Artificial Bee Colony for Data Clustering (융합 인공벌군집 데이터 클러스터링 방법)

  • Kang, Bum-Su;Kim, Sung-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • Data clustering is one of the most difficult and challenging problems and can be formally considered as a particular kind of NP-hard grouping problems. The K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, it has high possibility to trap in local optimum and high variation of solutions with different initials for the large data set. Therefore, we need study efficient computational intelligence method to find the global optimal solution in data clustering problem within limited computational time. The objective of this paper is to propose a combined artificial bee colony (CABC) with K-means for initialization and finalization to find optimal solution that is effective on data clustering optimization problem. The artificial bee colony (ABC) is an algorithm motivated by the intelligent behavior exhibited by honeybees when searching for food. The performance of ABC is better than or similar to other population-based algorithms with the added advantage of employing fewer control parameters. Our proposed CABC method is able to provide near optimal solution within reasonable time to balance the converged and diversified searches. In this paper, the experiment and analysis of clustering problems demonstrate that CABC is a competitive approach comparing to previous partitioning approaches in satisfactory results with respect to solution quality. We validate the performance of CABC using Iris, Wine, Glass, Vowel, and Cloud UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KABCK (K-means+ABC+K-means) is better than ABCK (ABC+K-means), KABC (K-means+ABC), ABC, and K-means in our simulations.

Pattern Analysis and Performance Comparison of Lottery Winning Numbers

  • Jung, Yong Gyu;Han, Soo Ji;kim, Jae Hee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.6 no.1
    • /
    • pp.16-22
    • /
    • 2014
  • Clustering methods such as k-means and EM are the group of classification and pattern recognition, which are used in management science and literature search widely. In this paper, k-means and EM algorithm are compared the performance using by Weka. The winning Lottery numbers of 567 cases are experimented for our study and presentation. Processing speed of the k-means algorithm is superior to the EM algorithm, which is about 0.08 seconds faster than the other. As the result it is summerized that EM algorithm is better than K-means algorithm with comparison of accuracy, precision and recall. While K-means is known to be sensitive to the distribution of data, EM algorithm is probability sensitive for clustering.