• Title/Summary/Keyword: K-means cluster

Search Result 616, Processing Time 0.027 seconds

Bootstrap Analysis and Major DNA Markers of BM4311 Microsatellite Locus in Hanwoo Chromosome 6

  • Yeo, Jung-Sou;Kim, Jae-Woo;Shin, Hyo-Sub;Lee, Jea-Young
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.17 no.8
    • /
    • pp.1033-1038
    • /
    • 2004
  • LOD scores related to marbling scores and permutation test have been applied for the purpose detecting quantitative trait loci (QTL) and we selected a considerable major locus BM4311. K-means clustering, for the major DNA marker mining of BM4311 microsatellite loci in Hanwoo chromosome 6, has been tried and five traits are divided by three cluster groups. Then, the three cluster groups are classified according to six DNA markers. Finally, bootstrap test method to calculate confidence intervals, using resampling method, has been adapted in order to find major DNA markers. It could be concluded that the major markers of BM4311 locus in Hanwoo chromosome 6 were DNA marker 100 and 95 bp.

An Implementation of K-Means Algorithm improving cluster centroids decision methodologies (클러스터 중심 결정 방법을 개선한 K-Means Algorithm의 구현)

  • Cho, Si-Sung;Kim, Ho-Young;Oh, Hyung-Jin;Lee, Shin-Won;An, Dong-Un;Chung, Sung-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.373-376
    • /
    • 2002
  • K-Means 알고리즘은 재배치 기법의 일종으로 K 개의 초기 클러스터중심(centroid)를 중심으로 K 개의 클러스터가 될 때까지 클러스터링을 반복하는 것이다. K-Means 알고리즘은 특성상 초기 클러스터 중심과 새롭게 생성된 클러스터 중심에 따라 클러스터링 결과가 달라진다. 본 논문에서는 K-Means Algorithm 의 초기 클러스터중심 선택 방법과 새로운 클러스터 중심 결정 방법을 개선한 변형 K-Means Algorithm을 제안한다. SMART 시스템에서 제안한 16가지 가중치 계산 방식에 의하여 두 알고리즘의 성능을 평가한 결과 제안한 변형 알고리즘이 재현률과 F-Measure 에서 20%이상 향상된 결과를 얻을 수 있었으며 특정 주제 아래 문서가 할당되는 클러스터링 성능이 우수하였다.

  • PDF

A Comparative Study on Statistical Clustering Methods and Kohonen Self-Organizing Maps for Highway Characteristic Classification of National Highway (일반국도 도로특성분류를 위한 통계적 군집분석과 Kohonen Self-Organizing Maps의 비교연구)

  • Cho, Jun Han;Kim, Seong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3D
    • /
    • pp.347-356
    • /
    • 2009
  • This paper is described clustering analysis of traffic characteristics-based highway classification in order to deviate from methodologies of existing highway functional classification. This research focuses on comparing the clustering techniques performance based on the total within-group errors and deriving the optimal number of cluster. This research analyzed statistical clustering method (Hierarchical Ward's minimum-variance method, Nonhierarchical K-means method) and Kohonen self-organizing maps clustering method for highway characteristic classification. The outcomes of cluster techniques compared for the number of samples and traffic characteristics from subsets derived by the optimal number of cluster. As a comprehensive result, the k-means method is superior result to other methods less than 12. For a cluster of more than 20, Kohonen self-organizing maps is the best result in the cluster method. The main contribution of this research is expected to use important the basic road attribution information that produced the highway characteristic classification.

Symptom Clusters in Patients with Breast Cancer (유방암 환자의 증상 클러스터)

  • Kim, Soo-Hyun;Lee, Ran;Lee, Keon-Suk
    • Korean Journal of Adult Nursing
    • /
    • v.21 no.6
    • /
    • pp.705-717
    • /
    • 2009
  • Purpose: The purpose of this study was to identify symptom clusters in patients with breast cancer and to investigate the associations among them with functional status and quality of life (QOL). Methods: A convenient sample of 303 patients was recruited from an oncology-specialized hospital. Results: Two distinct clusters were identified: A gastrointestinal- fatigue cluster and a pain cluster. Each cluster significantly influenced functional status and QOL. Based on these two clusters, we identified subgroups of symptom clusters using K-means cluster analysis. Three relatively distinct patient subgroups were identified in each cluster: mild, moderate, and severe group. Disease-related factors (i.e., stage, metastasis, type of surgery, current chemotherapy, and anti-hormone therapy) were associated with these subgroups of symptom clusters. There were significant differences in functional status and QOL among the three subgroups. The subgroup of patients who reported high levels of symptom clusters reported poorer functional status and QOL. Conclusion: Clinicians can anticipate that breast cancer patients with advanced stage, metastasis, and who receive mastectomy, and chemotherapy will have more intense gastrointestinal-fatigue or pain symptoms. In order to enhance functional status and QOL for patients with breast cancer, collective management for symptoms in a cluster may be beneficial.

  • PDF

Group Search Optimization Data Clustering Using Silhouette (실루엣을 적용한 그룹탐색 최적화 데이터클러스터링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Bum-Soo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.42 no.3
    • /
    • pp.25-34
    • /
    • 2017
  • K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.

K-means Clustering for Environmental Indicator Survey Data

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.04a
    • /
    • pp.185-192
    • /
    • 2005
  • There are many data mining techniques such as association rule, decision tree, neural network analysis, clustering, genetic algorithm, bayesian network, memory-based reasoning, etc. We analyze 2003 Gyeongnam social indicator survey data using k-means clustering technique for environmental information. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper, we used k-means clustering of several clustering techniques. The k-means clustering is classified as a partitional clustering method. We can apply k-means clustering outputs to environmental preservation and environmental improvement.

  • PDF

Assessing the Differences in Korean View on National Economic Policy with Factor and Cluster Analysis

  • Kim, Hee-Jae;Yun, Young-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.2
    • /
    • pp.451-461
    • /
    • 2008
  • In this study, factor and cluster analysis have been conducted to group the differences in Korean view on national economic policy in the sample of the 2006 Korean General Social Survey (KGSS). According to the 2006 KGSS, the 6 items with a 5-point Likert scale include the questions about whether or the extent to which each respondent supports the specific types of governmental economic policy. In our study, at first, the factor analysis has converted the original 6 items into the 3 composite variables that account for 81% in the total variability. As the second step of factor analysis, factor scores have been computed. Then, the K-means cluster analysis based on the factor scores has been conducted to group the survey respondents into the 3 clusters. In particular, the cross-tabulation analysis has shown that the distribution of the 3 clusters varies with the respondents' socio-demographic characteristics.

  • PDF

New Galaxy Catalog of the Virgo Cluster

  • Kim, Suk;Rey, Soo-Chang;Jerjen, Helmut;Lisker, Thorsten;Sung, Eon-Chang;Lee, Youngdae;Chung, Jiwon;Pak, Mina;Yi, Wonhyeong;Lee, Woong
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.39 no.2
    • /
    • pp.50-50
    • /
    • 2014
  • We present a new catalog of galaxies in the wider region of the Virgo cluster, based on the Sloan Digital Sky Survey (SDSS) Data Release 7. The Extended Virgo Cluster Catalog (EVCC) covers an area of 725 deg2 or 60.1 Mpc2. It is 5.2 times larger than the footprint of the classical Virgo Cluster Catalog (VCC) and reaches out to 3.5 times the virial radius of the Virgo cluster. We selected 1324 spectroscopically targeted galaxies with radial velocities less than 3000 km s-1. In addition, 265 galaxies that have been missed in the SDSS spectroscopic survey but have available redshifts in the NASA Extragalactic Database are also included. Our selection process secured a total of 1589 galaxies of which 676 galaxies are not included in the VCC. The certain and possible cluster members are defined by means of redshift comparison with a cluster infall model. We employed two independent and complementary galaxy classification schemes: the traditional morphological classification based on the visual inspection of optical images and a characterization of galaxies from their spectroscopic features. SDSS u, g, r, i, and z passband photometry of all EVCC galaxies was performed using Source Extractor. We compare the EVCC galaxies with the VCC in terms of morphology, spatial distribution, and luminosity function. The EVCC defines a comprehensive galaxy sample covering a wider range in galaxy density that is significantly different from the inner region of the Virgo cluster. It will be the foundation for forthcoming galaxy evolution studies in the extended Virgo cluster region, complementing ongoing and planned Virgo cluster surveys at various wavelengths.

  • PDF

Cluster Analysis of PM10 Concentrations from Urban Air Monitoring Network in Korea during 2000 to 2005 (전국 도시대기 측정망의 2000~2005년 PM10 농도 군집분석)

  • Han, Ji-Hyun;Lee, Mee-Hye;Ghim, Young-Sung
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.24 no.3
    • /
    • pp.300-309
    • /
    • 2008
  • Variations in PM10 concentration between 2000 and 2005 from 84 urban air monitoring stations operated by the government were analyzed. The K-means cluster analysis was attempted using annual average and the 99th percentile of daily averages as parameters. The results obtained by excluding Asian dust episode days were compared with those obtained by using all available data. In any cases, the cluster with the highest mean concentration was mostly composed of stations in Seoul and Gyeonggi. Annual average of the cluster with the highest mean concentration showed a distinct decreasing trend, but that excluding Asian dust episode days did not show such a trend. Without Asian dust episode days high concentrations of monthly averages in March and April were also not observed. The effect of Asian dust was more pronounced in the 99th percentile of daily averages. The 99th percentile of daily averages of the cluster with the highest mean concentration was the highest in June following downs in April and May.

An Analysis of Children's Creative Thinking Styles According to Cluster Analysis (군집분석을 이용한 아동의 창의적 사고유형 분석)

  • Kim, Kyoung Eu;Kim, Eun A;Kim, Seong Hui
    • Korean Journal of Child Studies
    • /
    • v.35 no.2
    • /
    • pp.103-115
    • /
    • 2014
  • This study explored the creative thinking styles of children according to cluster analysis and examined group differences in the gender of children. The participants consisted of 250 elementary school students living in Seoul, Korea. Data were analyzed by means of cluster analysis and ${\chi}^2$ test. The results from the cluster analysis based on the scores on the sub-factors of TTCT(Torrance Test of Creative Thinking) suggested the existence of four clusters('Non-creative', 'Divergent creative', 'Elaborate creative, 'Multiple creative'). Additionally, four clusters were found to be differentiated according to gender.