• Title/Summary/Keyword: K-means cluster

Search Result 615, Processing Time 0.025 seconds

XML Document Clustering Technique by K-means algorithm through PCA (주성분 분석의 K 평균 알고리즘을 통한 XML 문서 군집화 기법)

  • Kim, Woo-Saeng
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.339-342
    • /
    • 2011
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and storing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. We use a K-means algorithm with a Principal Component Analysis(PCA) to cluster XML documents after they are represented by vectors in the feature vector space by transferring them as names and levels of the elements of the corresponding trees. The experiment shows that our proposed method has a good result.

Automatic Dynamic Range Improvement Method using Histogram Modification and K-means Clustering (히스토그램 변형 및 K-means 분류 기반 동적 범위 개선 기법)

  • Cha, Su-Ram;Kim, Jeong-Tae;Kim, Min-Seok
    • Journal of Broadcast Engineering
    • /
    • v.16 no.6
    • /
    • pp.1047-1057
    • /
    • 2011
  • In this paper, we propose a novel tone mapping method that implements histogram modification framework on two local regions that are classified using K-means clustering algorithm. In addition, we propose automatic parameter tuning method for histogram modification. The proposed method enhances local details better than the global histogram method. Moreover, the proposed method is fully automatic in the sense that it does not require intervention from human to tune parameters that are involved for computing tone mapping functions. In simulations and experimental studies, the proposed method showed better performance than existing histogram modification method.

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF

Classification and Characteristic analysis of Mountain Village Landscape Using Cluster Analysis (군집분석을 이용한 산촌경관 유형 구분 및 특성 분석)

  • Ko, Arang;Lim, Jungwoo;Kim, Seong Hak
    • Journal of Korean Society of Rural Planning
    • /
    • v.26 no.1
    • /
    • pp.101-112
    • /
    • 2020
  • Recently, public awareness regarding mountain villages' landscapes is increasing. Thus, this study aimed to provide standards for conservation, management and creation of mountain village landscape by characterizing and classifying those exist. 286 mountain villages' data were collected and 19 variables - extracted from GIS spatial information and statistic data of mountain villages, chosen as right sources according to former studies - were utilized to conduct factor and cluster analysis. As a result of the factor analysis, 7 characteristics of the mountain villages' landscapes were defined - 'Location', 'Cultivation', 'Ecology·Nature', 'Tourism', 'Residence', 'Recreation'. The K-means cluster analysis categorized the mountain villages' landscapes into four types - 'Residential', 'Touristic', 'General', 'Environmentally protected'. The classification was examined to be appropriate by field assessment, and basic guidelines of mountain village landscape management were set. The results of this study are expected to be utilized planning and implementing regarding mountain village landscape in the future.

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.

Determination of Optimal Cluster Size Using Bootstrap and Genetic Algorithm (붓스트랩 기법과 유전자 알고리즘을 이용한 최적 군집 수 결정)

  • Park, Min-Jae;Jun, Sung-Hae;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.12-17
    • /
    • 2003
  • Optimal determination of cluster size has an effect on the result of clustering. In K-means algorithm, the difference of clustering performance is large by initial K. But the initial cluster size is determined by prior knowledge or subjectivity in most clustering process. This subjective determination may not be optimal. In this Paper, the genetic algorithm based optimal determination approach of cluster size is proposed for automatic determination of cluster size and performance upgrading of its result. The initial population based on attribution is generated for searching optimal cluster size. The fitness value is defined the inverse of dissimilarity summation. So this is converged to upgraded total performance. The mutation operation is used for local minima problem. Finally, the re-sampling of bootstrapping is used for computational time cost.

An Improved Automated Spectral Clustering Algorithm

  • Xiaodan Lv
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.185-199
    • /
    • 2024
  • In this paper, an improved automated spectral clustering (IASC) algorithm is proposed to address the limitations of the traditional spectral clustering (TSC) algorithm, particularly its inability to automatically determine the number of clusters. Firstly, a cluster number evaluation factor based on the optimal clustering principle is proposed. By iterating through different k values, the value corresponding to the largest evaluation factor was selected as the first-rank number of clusters. Secondly, the IASC algorithm adopts a density-sensitive distance to measure the similarity between the sample points. This rendered a high similarity to the data distributed in the same high-density area. Thirdly, to improve clustering accuracy, the IASC algorithm uses the cosine angle classification method instead of K-means to classify the eigenvectors. Six algorithms-K-means, fuzzy C-means, TSC, EIGENGAP, DBSCAN, and density peak-were compared with the proposed algorithm on six datasets. The results show that the IASC algorithm not only automatically determines the number of clusters but also obtains better clustering accuracy on both synthetic and UCI datasets.

Partial Discharge Data Analysis with Unsupervised Classification (무감독분류 기법에 의한 부분방전 데이터 분석)

  • Cho, Kyungsoon;Hong, Seonhack
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.14 no.4
    • /
    • pp.9-16
    • /
    • 2018
  • This study described partial discharge(PD) distribution analysis between the XLPE(Cross-Linked PolyEthylene)and EPDM(Ethylene Propylene Diene Monomer) interface with unsupervised classification. The ${\phi}-q-n$ patterns were analyzed using phase resolved partial discharge(PRPD). K-means cluster analysis forms a cluster based on similarities and distances among scattered individuals, and analyzes the characteristics of the formed clusters, dividing the multivariate data into several groups according to the similarity of each characteristic, Is a statistical analysis that makes it easier to navigate. It was confirmed that the phase angle of the cluster with the maximum discharge charge was concentrated around $0^{\circ}$ and $180^{\circ}$ at 30 kV after the initial phase distribution localized around $90^{\circ}$ and $300^{\circ}$ expanded to the whole phase angle according to the voltage rise. The Euclidean distance between the center of gravity and the discharge charge in the ${\Phi}-q$ cluster increased with increasing applied voltage.

Factors affecting to the Quality of Korean Soybean Paste, Doenjang (한국 된장의 품질에 영향을 미치는 요인)

  • Shim, Hye-Jeoung;Yun, Jeong-hyun;Koh, Kyung-Hee
    • Journal of Applied Biological Chemistry
    • /
    • v.61 no.4
    • /
    • pp.357-365
    • /
    • 2018
  • The quality of Korean doenjang, which was traditionally made for this study, was monitored for physicochemical properties, antioxidant capacity, and sensory properties at six months intervals for three years. The collected data were comprehensively analyzed using the k-means clustering via principal component analysis (PCA) to determine the optimal intake duration and sensory factors associated with acceptance. Doenjang samples were classified with every year interval based on PCA, and then the classified doenjang samples were further grouped into cluster one, two, and three based on the k-means clustering. In Cluster three, doenjang that was aged for thirty and thirty-six months, respectively, showed high total phenolic content, antioxidant capacity, superoxide dismutase like activity, and 2,2-diphenyl-1-picryl-hydrazyl radical scavenging capacity. Interestingly, along with acceptance, the levels of free amino acids and organic acids were higher in Cluster 3. The sensory factors found to be associated with acceptance included umami taste and brown color. In conclusion, this study proposes the intake of doenjang aged for thirty months based on its antioxidant activity and sensory properties although doenjang is usually ready after twelve months of aging.

Performance Evaluation of k-means and k-medoids in WSN Routing Protocols

  • SeaYoung, Park;Dai Yeol, Yun;Chi-Gon, Hwang;Daesung, Lee
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.4
    • /
    • pp.259-264
    • /
    • 2022
  • In wireless sensor networks, sensor nodes are often deployed in large numbers in places that are difficult for humans to access. However, the energy of the sensor node is limited. Therefore, one of the most important considerations when designing routing protocols in wireless sensor networks is minimizing the energy consumption of each sensor node. When the energy of a wireless sensor node is exhausted, the node can no longer be used. Various protocols are being designed to minimize energy consumption and maintain long-term network life. Therefore, we proposed KOCED, an optimal cluster K-means algorithm that considers the distances between cluster centers, nodes, and residual energies. I would like to perform a performance evaluation on the KOCED protocol. This is a study for energy efficiency and validation. The purpose of this study is to present performance evaluation factors by comparing the K-means algorithm and the K-medoids algorithm, one of the recently introduced machine learning techniques, with the KOCED protocol.