• Title/Summary/Keyword: 군집분석

Search Result 1,368, Processing Time 0.118 seconds

Impact Analysis of Partition Utility Score in Cluster Analysis (군집분석의 분할 유용도 점수의 영향 분석)

  • Lee, Gye Sung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.3
    • /
    • pp.481-486
    • /
    • 2021
  • Machine learning algorithms adopt criterion function as a key component to measure the quality of their model derived from data. Cluster analysis also uses this function to rate the clustering result. All the criterion functions have in general certain types of favoritism in producing high quality clusters. These clusters are then described by attributes and their values. Category utility and partition utility play an important role in cluster analysis. These are fully analyzed in this research particularly in terms of how they are related to the favoritism in the final results. In this research, several data sets are selected and analyzed to show how different results are induced from these criterion functions.

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls (군집분석 비교 및 한우 관능평가데이터 군집화)

  • Kim, Jae-Hee;Ko, Yoon-Sil
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.745-758
    • /
    • 2009
  • Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

데이터 마이닝에서의 군집분석 알고리즘 비교 연구

  • Lee, Yeong-Seop;An, Mi-Yeong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • /
    • pp.19-25
    • /
    • 2003
  • 데이터베이스에 내재된 패턴이나 관계를 묘사한 것만으로도 의사결정에 필요한 정보를 제공할 수 있는데 이 데이터들의 변수들을 비슷한 특징을 가지는 소그룹으로 나누어 패턴을 찾는 것을 군집분석이라 한다. 이러한 군집 분석에는 분리군집방법과 계층적군집방법이 있는데, 재할당이 가능한 분리군집방법의 여러 알고리즘에 대해 비교해보자. 분리군집알고리즘에는 중심을 평균으로 하는 k-평균 알고리즘과, 중심을 메도이드로하는 PAM, CLARA, CLARANS 알고리즘이 있다. 이러한 알고리즘에 대한 이론과, 장단점을 설명하고, 분산과 중심들간의 평균 거리로 비교해 본다.

  • PDF

Charaterization of Cities in Seoul Metropolitan Area by Cluster Analysis (군집분석을 이용한 수도권 도시의 유형화에 관한 연구)

  • Song, Min-Kyung;Chang, Hoon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.18 no.1
    • /
    • pp.83-88
    • /
    • 2010
  • This paper has analyzed Seoul metropolitan area on the basis of cluster characteristics and it is to understand the traits of each clusters. In order to modelize the area, 10 different indicators were selected among components of a city such as population, activities, land and facilities. Also through principal component analysis, similar characteristics or congenialities of the variables were derived as a common factor. The result was organized by factor score from hierarchical clustering method and as a final result, metropolitan area was clustered into five areas.

Cluster analysis by month for meteorological stations using a gridded data of numerical model with temperatures and precipitation (기온과 강수량의 수치모델 격자자료를 이용한 기상관측지점의 월별 군집화)

  • Kim, Hee-Kyung;Kim, Kwang-Sub;Lee, Jae-Won;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1133-1144
    • /
    • 2017
  • Cluster analysis with meteorological data allows to segment meteorological region based on meteorological characteristics. By the way, meteorological observed data are not adequate for cluster analysis because meteorological stations which observe the data are located not uniformly. Therefore the clustering of meteorological observed data cannot reflect the climate characteristic of South Korea properly. The clustering of $5km{\times}5km$ gridded data derived from a numerical model, on the other hand, reflect it evenly. In this study, we analyzed long-term grid data for temperatures and precipitation using cluster analysis. Due to the monthly difference of climate characteristics, clustering was performed by month. As the result of K-Means cluster analysis is so sensitive to initial values, we used initial values with Ward method which is hierarchical cluster analysis method. Based on clustering of gridded data, cluster of meteorological stations were determined. As a result, clustering of meteorological stations in South Korea has been made spatio-temporal segmentation.

웹로그 데이터에 대한 군집분석 알고리즘에 관한 연구

  • Gang, Hyeon-Cheol;Han, Sang-Tae;Seon, Yeong-Su
    • Proceedings of the Korean Statistical Society Conference
    • /
    • /
    • pp.313-318
    • /
    • 2003
  • 최근 인터넷은 기업이 고객과 접촉할 수 있는 새로운 수단으로써 기업의 홍보나 서비스를 제공하는 기능을 수행할 뿐만 아니라 사업을 위한 중요한 도구로 여겨지고 있다. 따라서 방문자의 웹사이트 이용형태를 파악하기 위한 다양한 기법들이 제시되고 있으며, 웹로그 데이터에 대한 자료분석 기법들이 여러 학문분야에서 연구되고 있다. 본 연구에서는 웹로그 데이터에 대한 군집분석을 위해 거리측도 및 분석 알고리즘을 제안하였으며, 실제 자료에 이를 적용하여 제안된 알고리즘의 특성을 살펴보았다.

  • PDF

Classification of universities in Daegu·Gyungpook by support vector cluster analysis (서포트벡터 군집분석을 이용한 대구·경북지역 대학의 분류)

  • Park, Hye Jung;Kim, Jong Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.783-791
    • /
    • 2013
  • There are sixteen indicators of "College Information" found on the website of College Information Disclosure Center. Among these indicators, the current study examined an enrollment rate and an employment rate based on health insurance coverage, and focused on twenty-four universities in Daegu and Gyeongbuk area. The universities were classified into groups by the enrollment rate and employment rate. This study investigated the characteristics pertaining to those different groups. Hierarchical cluster analysis and support vector cluster analysis were conducted in order to analyze the characteristics of the groups statistically.

The Study of selection method about Elderly Pedestrian Hotspot by Cluster Analysis (군집분석을 통한 노인 보행자 사고 취역지역 선정방법에 관한 연구)

  • Ko, Eun-Hyeck;Yoon, Byoung-Jo;Park, Hyung-Geun;Yang, Sung-Ryong
    • Proceedings of the Korean Society of Disaster Information Conference
    • /
    • /
    • pp.193-194
    • /
    • 2015
  • 본 연구는 요인분석을 통해 노인 보행자 사고 유형을 대표할 수 있는 성분 값을 계산하고 군집분석을 실시하여 노인 보행자 사고 취약 지역을 선정하는 모델을 확인하였다. 기존 노인보행자 사고에 관한 연구는 보행 환경 분석 및 노인보행자 사고 특성을 확인한 뒤, 제도적, 물리적 환경 개선 등에 대한 정책 제언의 형식으로 진행되었다. 이러한 연구는 실질적으로 노인 보행자 특성을 분석하여 사고를 감소시키는 연구가 아닌, 일반적으로 알려진 사실이나 해외 사례를 들어 노인 보행자 사고의 현 실태에 대해 어느 정도 환기만 시켜주는 역할로 그쳤다. 이러한 점에서 군집분석을 통한 노인 보행자 사고 취약지역 선정은 노인 보행자 사고 특성의 비교를 명확하게 할 수 있도록 새로운 기준을 제시하였다. 이에 기존의 방법론에서 벗어나 실질적으로 노인 보행자 사고 방지 대책이 시급한 지역을 선정하였고, 노인 보행자 사고에 관해 활발히 연구 시킬 것이라 예상한다. 군집분석을 사용하는 핵심은 사망자 수와 사망률이 상대적으로 동시에 높은 군집을 선정할 수 있고 지역 특성 비교를 통해 향후 노인 보행자 사고에 관한 추가 연구가 가능할 것으로 기대한다.

  • PDF

Clustering of Time-Course Microarray Data Using Pharmacokinetic Parameter (약동학적 파라미터를 이용한 시간경로 마이크로어레이 자료의 군집분석)

  • Lee, Hyo-Jung;Kim, Peol-A;Park, Mi-Ra
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.623-631
    • /
    • 2011
  • A major goal of time-course microarray data analysis is the detection of groups of genes that manifest similar expression patterns over time. The corresponding numerous cluster algorithms for clustering time-course microarray data have been developed. In this study, we proposed a clustering method based on the primary pharmacokinetic parameters in the pharmacokinetics study for assessment of pharmaceutical equivalents between two drug products. A real data and a simulation data was used to demonstrate the usefulness of the proposed method.

Cluster analysis with Korean weather data: Application of model-based Bayesian clustering method (한국 기상자료의 군집분석: 베이지안 모델기반 방법의 응용)

  • Joo, Yong-Sung;Jung, Hyung-Joo;Kim, Byung-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.1
    • /
    • pp.57-64
    • /
    • 2009
  • In this paper, 30 main cities are clustered based on precipitation, temperature, wind speed, photo period, and humidity. We found that the resulting clusters has strong relationships with geographical locations. These results make sense because, although Korea is a small country, Korean weather is known to have strong locality. The largest number of clusters is found when wind speed is used as an interested variable for clustering and the smallest number of clusters is found when photo period is used. The large number of clusters based on wind speed indicates that wind speed is affected easily by local geography.

  • PDF