• Title/Summary/Keyword: K-means cluster

Search Result 615, Processing Time 0.025 seconds

Bootstrap Method for k-Spatial Medians

  • Jhun, Myoung-Shic
    • Journal of the Korean Statistical Society
    • /
    • v.15 no.1
    • /
    • pp.1-8
    • /
    • 1986
  • The k-medians clustering method is considered to partition observations into k clusters. Consistency and advantage of bootstrap confidence sets of k optimal cluster centers are discussed. The k-medians and k-means clustering methods are compared by using actual data sets.

  • PDF

Enhancing Document Clustering Method using Synonym of Cluster Topic and Similarity (군집 주제의 유의어와 유사도를 이용한 문서군집 향상 방법)

  • Park, Sun;Kim, Kyung-Jun;Lee, Jin-Seok;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.5
    • /
    • pp.30-38
    • /
    • 2011
  • This paper proposes a new enhancing document clustering method using a synonym of cluster topic and the similarity. The proposed method can well represent the inherent structure of document cluster set by means of selecting terms of cluster topic based on the semantic features by NMF. It can solve the problem of "bags of words" by using of expanding the terms of cluster topics which uses the synonyms of WordNet. Also, it can improve the quality of document clustering which uses the cosine similarity between the expanded cluster topic terms and document set to well cluster document with respect to the appropriation cluster. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

DNA Marker Mining of BMS1167 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won;Kwon, Jae-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.325-333
    • /
    • 2006
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS1167 resulted in three cluster groups. We conclude that the major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 are markers 100bp, 108bp and 110bp.

  • PDF

A Major DNA Marker Mining of BMS941 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.913-921
    • /
    • 2005
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS941 resulted in three cluster groups. We conclude that the major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 are markers 80bp, 85bp 90bp and 105bp.

  • PDF

Analysis of Document Clustering Varing Cluster Centroid Decisions (클러스터 중심 결정 방법에 따른 문서 클러스터링 성능 분석)

  • 오형진;변동률;이신원;박순철;정성종;안동언
    • Proceedings of the IEEK Conference
    • /
    • 2002.06c
    • /
    • pp.99-102
    • /
    • 2002
  • K-means clustering algorithm is a very popular clustering technique, which is used in the field of information retrieval. In this paper, We deal with the problem of K-means Algorithm from the view of creating the centroids and suggest a method reflecting document feature and considering the context of each document to determine the new centroids during the process of forming new centroids. For experiment, We used the automatic document summarizer to summarize the Reuter21578 newslire test dataset and achieved 20% improved results to the recall metrics.

  • PDF

새로운 모형기반 군집분석 알고리즘

  • Park, Jeong-Su;Hwang, Hyeon-Sik
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.97-100
    • /
    • 2005
  • A new model-based clustering algorithm is proposed. The idea starts from the assumption that observations are realizations of Gaussian processes and so are correlated. With a special covariance structure, the posterior probability that an observation belongs to each cluster is computed using the ECM algorithm. A preliminary result of small-scale simulation study is given to compare with the k-means clustering algorithms.

  • PDF

VS-FCM: Validity-guided Spatial Fuzzy c-Means Clustering for Image Segmentation

  • Kang, Bo-Yeong;Kim, Dae-Won
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.89-93
    • /
    • 2010
  • In this paper a new fuzzy clustering approach to the color clustering problem has been proposed. To deal with the limitations of the traditional FCM algorithm, we propose a spatial homogeneity-based FCM algorithm. Moreover, the cluster validity index is employed to automatically determine the number of clusters for a given image. We refer to this method as VS-FCM algorithm. The effectiveness of the proposed method is demonstrated through various clustering examples.

An Analysis of Replication Enhancement for a High Availability Cluster

  • Park, Sehoon;Jung, Im Y.;Eom, Heonsang;Yeom, Heon Y.
    • Journal of Information Processing Systems
    • /
    • v.9 no.2
    • /
    • pp.205-216
    • /
    • 2013
  • In this paper, we analyze a technique for building a high-availability (HA) cluster system. We propose what we have termed the 'Selective Replication Manager (SRM),' which improves the throughput performance and reduces the latency of disk devices by means of a Distributed Replicated Block Device (DRBD), which is integrated in the recent Linux Kernel (version 2.6.33 or higher) and that still provides HA and failover capabilities. The proposed technique can be applied to any disk replication and database system with little customization and with a reasonably low performance overhead. We demonstrate that this approach using SRM increases the disk replication speed and reduces latency by 17% and 7%, respectively, as compared to the existing DRBD solution. This approach represents a good effort to increase HA with a minimum amount of risk and cost in terms of commodity hardware.

AN IMPLEMENTATION AND EVALUATION OF RANDOMIZED-ANN SIMULATOR USING A PC CLUSTER

  • Morita, Yoshiharu;Nakagawa, Tohru;Kitagawa, Hajime
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2001.10a
    • /
    • pp.99-102
    • /
    • 2001
  • We propose a PC cluster using general-purpose microprocessors and a high-speed network for simulating ANN (Artificial Neural Network) processes on Linux OS. We apply this cluster to intelligent information processing such as ANN simulation. The elapsed time for simulating ANNs can be reduced from 7,295 seconds by a PE (Processing Element) to 1,226 seconds by six PEs. The reliability of a pattern-classification using ANNs can be improved by the proposed ANN, Randomized-ANN. In order to generate a Randomized-ANN, we choose three ANNs and combine the output results from three huts by means of logical AND. Results are as follows: The mean correct answer rate is 94.4%, the mean wrong answer rate is only 0.1 %, and the mean unknown answer rate is 5.5 %. We make sure that Randomized-ANN approach reduces the mean wrong answer rate within a tenth part and improves the reliability of Japanese coin classification.

  • PDF

Development of Subsurface Spatial Information Model with Cluster Analysis and Ontology Model (온톨로지와 군집분석을 이용한 지하공간 정보모델 개발)

  • Lee, Sang-Hoon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.13 no.4
    • /
    • pp.170-180
    • /
    • 2010
  • With development of the earth's subsurface space, the need for a reliable subsurface spatial model such as a cross-section, boring log is increasing. However, the ground mass was essentially uncertain. To generate model was uncertain because of the shortage of data and the absence of geotechnical interpretation standard(non-statistical uncertainty) as well as field environment variables(statistical uncertainty). Therefore, the current interpretation of the data and the generation of the model were accomplished by a highly trained experts. In this study, a geotechnical ontology model was developed using the current expert experience and knowledge, and the information content was calculated in the ontology hierarchy. After the relative distance between the information contents in the ontology model was combined with the distance between cluster centers, a cluster analysis that considered the geotechnical semantics was performed. In a comparative test of the proposed method, k-means method, and expert's interpretation, the proposed method is most similar to expert's interpretation, and can be 3D-GIS visualization through easily handling massive data. We expect that the proposed method is able to generate the more reasonable subsurface spatial information model without geotechnical experts' help.