• Title/Summary/Keyword: K-평균.군집분석

Search Result 33, Processing Time 0.085 seconds

Automated K-Means Clustering and R Implementation (자동화 K-평균 군집방법 및 R 구현)

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.723-733
    • /
    • 2009
  • The crucial problems of K-means clustering are deciding the number of clusters and initial centroids of clusters. Hence, the steps of K-means clustering are generally consisted of two-stage clustering procedure. The first stage is to run hierarchical clusters to obtain the number of clusters and cluster centroids and second stage is to run nonhierarchical K-means clustering using the results of first stage. Here we provide automated K-means clustering procedure to be useful to obtain initial centroids of clusters which can also be useful for large data sets, and provide software program implemented using R.

Seismotectonic zoning by K-means clustering analysis in the Korean Peninsula (K-평균 군집분석에 의한 한반도에서의 지진지체구조구 구분)

  • Kim, Sung Kyun;Jeon, Jeong Soo;Jun, Myung-Soon
    • Journal of the Geological Society of Korea
    • /
    • v.53 no.5
    • /
    • pp.703-714
    • /
    • 2017
  • It is not easy to identify seismic source zone for use in probabilistic seismic hazard analysis in the intraplate region. There is no unique formal procedure for developing and evaluating seismic source models. The K-means cluster analysis is applied to seismicity data as a point source to delineate seismotectonic model for the Korean Peninsula in this study. The number of clusters K determined by KL index and Elbow methods appears to be five and nine, respectively. A seismotectonic model composed of five source zones is developed and an alternative model with nine zones is also proposed. Seismicity parameters estimated in each zone are presented.

K-평균 군집분석을 활용한 다중대응분석의 재해석

  • 김경희;최용석
    • Proceedings of the Korean Statistical Society Conference
    • /
    • /
    • pp.175-178
    • /
    • 2001
  • 다원분할표에서 범주들의 대응관계를 그래프적으로 보여주는 다중대응분석(multiple correspondence analysis)은 주결여성(principal inertia)이 총결여성(total inertia)에서 차지하는 비율이 전반적으로 낮아 설명력(goodness-of-fit)이 낮은 2차원의 대응분석그림을 얻게 된다. 이를 극복하기 위해 Benzecri의 공식을 사용하면 낮은 주결여성을 높이고 새로운 2차원 대응분석그림을 얻을 수 있다. 그러나 이 새로운 대응분석그림도 범주들의 대응관계를 명확히 보여주지는 못한다(Greenacre and Blasius, 1994, chapter 10). 앤드류 플롯(Andrews plot)을 이용하여 범주들의 군집화(clustering)로 다중대응분석을 재해석 하고자 하나 범주의 수가 많은 경우 해석상 어려움이 따른다. 본 소고에서 이와 같은 경우 K-평균 군집분석을 활용하여 다중대응분석의 해석을 용이하게 하고자 한다.

  • PDF

A Study on the Implementation of Walking Environment Projects by Analyzing Characteristics of Pedestrian Accidents by Local Government Types (지방자치단체의 유형별 보행자사고 특성분석 및 보행환경조성사업 개선방안 연구)

  • Park, Jinkyung;Han, Myungjoo
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.6
    • /
    • pp.615-627
    • /
    • 2014
  • In this study, nonhierarchical K-mean cluster analysis is used to classify the types of 230 local governments and the Mann-Whitney U test and Kruskal-Wallis analysis are used to analyze the characteristics of pedestrian accidents by region types. With empirical analysis of pedestrian accidents, this study suggests improvements of walking environments reflecting local characteristics. Type 1-A (relatively dominant urban commercial areas), Type 1-B (predominantly urban residence) and Type 2 (rural areas) have been classified using nonhierarchical K-mean cluster analysis. According to the results, pedestrian accident rate on community roads was more than 60% for all types and incidence rate in rural areas was higher than that in urban areas. In addition, pedestrian accidents of Type 1-B have been found to occur more frequently than Type 2 in intersections and crossings, while the number of roadside casualties for Type 2 was highest.

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

A Study on Classifications and Characteristics of Declined Rural Area in Chungcheong Region (충청권 농촌지역 쇠퇴 특성 및 유형에 관한 연구)

  • Jo, Jin-Hee;Park, Hyung-Keun;Mo, Hye-Ran;Lee, Han-Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.35 no.1
    • /
    • pp.203-215
    • /
    • 2015
  • The study aims to identify the degree and types of spatial recessions in Si/Gun and Eup/Myun units within Chungcheong region in South Korea to contribute to the efforts being made to diagnose the rural recession and the potentials. To this end, we analyzed 27 Sis and Guns to identify the degree of recession and potentials of rural areas in Chungcheong region. We also carried out the diagnosis and K-Means Clustering on 274 Eups and Myuns, smaller administrative units, to figure out the types and characteristics of the rural recessions. In case of the analysis targeting the Sis and Guns, a relatively high degree of rural recession was found in Cheongyang, Seocheon and Taean for Chungcheongnam-do, and in Danyang and Goisan, as well as in Boeun, Okcheon and Youngdong - which are collectively called as 'Southern 3 Areas in Chungcheongbuk-do' as they are conventionally known by their high degree of rural recession. According to the results of the clustering analysis carried out on the 166 Eups and Myuns, there were five outstanding clusters. They were; areas with housing deterioration (29), areas with poor economic foundation (16), areas with poor accessibility to central areas (42), areas with poor residential environment (51) and areas with aged population (28). The findings and results of the present study are likely to serve as a basis for the design and enforcement of forthcoming rural area activation policies. Also, it would be highly recommended that a more comprehensive diagnosis is taken from a community-level perspective and policy suggestions and strategies tailored for rural communities are further discussed.

Cluster Analysis of the 1000-hPa Height Field around the Korean Peninsula (한반도 주변 1000-hPa 고도장의 군집분석)

  • Jeong, Young-Kun
    • Journal of the Korean earth science society
    • /
    • v.33 no.4
    • /
    • pp.337-349
    • /
    • 2012
  • In this study, we classify the 1000 hPa geopotential height fields around the Korean peninsula through the Kmeans cluster analysis and investigate the occurrence characteristics of each cluster pattern. The 11 clusters are identified as the typical pressure patterns, applying the pattern correlation as a similarity among clusters and the criterion of cluster similarity 0.8, of which three pressure patterns are associated with the extension of Siberia air mass, other three with the latitudes of the longest symmetry axis of North Pacific highs, two with the trough largely under the air mass of Siberia or North Pacific, and the remaining three, the migratory high patterns generally occurring in spring and autumn, are disjointed according to the direction of the longest symmetry axis of highs. The occurrence rate of air masses affecting the Korean peninsula, estimated from the number of occurrence days of 11 pressure patterns, is 55.4% Siberian, 29.3% North Pacific, 12.8% Yangtze-River, 2.5% Okhotsk sea and 68.2% of all these is the continental air masses. The wintertime pressure patterns around the Korean peninsula are nearly contrary to those in summertime, each dominated by the highs extended from the stationary air masses over the Central Siberia and the North Pacific ocean. The migratory highs occur largely in spring and autumn while transferring from the wintertime patterns to summertime patterns, or vice versa. Recently, the occurrence frequency of the highs extended from the North Pacific is on the decrease and while the wintertime pressure patterns occur frequently in spring and autumn, the occurrence frequency of the pressure patterns with trough is on the increase and the migratory highs occur in nearly all seasons.

Smart elevator operation management mobile application design using clustering techniques (클러스터링 기법을 이용한 스마트 엘리베이터 운영관리 모바일 어플리케이션 설계)

  • Park, Hung-Bog;Son, Gwan-yeong;Choi, Geum-kang;Hwang, You-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • /
    • pp.661-662
    • /
    • 2016
  • It's being a trend that contemporary buildings are getting higher. And in line with this, we suggest the smart elevator operation management mobile application which is designed using clustering techniques to improve user's elevator access control and convenience. This clustering technique using the elevator's calling data make it possible to find its best position, and know the state of it using a smart phone as well. With this, not only the users' waiting time can be decreased, but also their convenience can be improved with the remote control system that got efficient a lot.

  • PDF

Comparative Study on Shopping Behavior of Korean Overseas Tourist Groups Based on Travel Motivation (여행동기에 따른 해외여행자 집단별 쇼핑행동 비교)

  • Jeon, Yangjin
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.17 no.1
    • /
    • pp.25-37
    • /
    • 2015
  • 본 연구의 목적은 해외여행 동기에 따라 여행자들의 집단을 나누고 각 집단별로 해외여행시 구매하는 상품이나 이용 매장의 특성을 비교하는데 있다. 문헌연구를 통해 여행동기와 구매상품의 종류와 속성, 쇼핑장소의 유형과 속성에 대한 주요 문항들을 추출하였다. 20-50대 해외여행 경험자 431명을 대상으로 설문조사를 실시하였고 K-평균 군집분석을 통해, 적극적 집단, 소극적 집단, 자연 쾌락추구 집단, 가족 발견추구 집단의 4개의 군집이 확인되었다. 적극적인 여행자들은 해외에서 구매하는 모든 상품종류에 대해 가장 높은 관심을 보였으며 다른 세 집단보다 유의하게 차이가 있었다. 특히 소극적인 여행자나 자연 쾌락추구 여행자들보다 패션 사치품이나 기념품 구매를 중요하게 생각하는 것으로 나타났다. 또한 상품 속성에서 디자인과 명성, 실용성, 가격과 품질 등의 요인들을 중요하게 고려하였다. 구매 장소 측면에서는 적극적 집단은 지역 시장, 패션매장, 선물매장 순으로 선호하였으며 소극적인 여행자들은 패션매장을 더 선호하는 것으로 나타났다. 구매장소 속성의 중요도는 편의성, 디스플레이, 매장위치 및 판촉활동 순으로 중요시되었으며 적극적인 여행자들은 다른 세 집단 여행자들보다 매장 편의성에 대한 관심이 유의하게 높았다. 가족 발견중심 여행자와 자연 쾌락추구 여행자 집단은 쇼핑행동이 비슷하거나 일부 요인에서만 차이가 있었다. 소극적 여행자들은 나머지 세 집단과 구별되게 모든 쇼핑행동에 대한 관심이 낮았다. 여행동기에 근거한 시장세분화는 서로 다른 쇼핑행동을 예측할 수 있는 변별력이 있음을 보여주었다.

  • PDF

Exploratory Analysis of Gene Expression Data Using Biplot (행렬도를 이용한 유전자발현자료의 탐색적 분석)

  • Park, Mi-Ra
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.355-369
    • /
    • 2005
  • Genome sequencing and microarray technology produce ever-increasing amounts of complex data that needs statistical analysis. Visualization is an effective analytic technique that exploits the ability of the human brain to process large amounts of data. In this study, biplot approach applied to microarray data to see the relationship between genes and samples. The supplementary data method to classify new sample to known category is suggested. The methods are validated by applying it to well known microarray data such as Golub et al.(1999), Alizadeh et al.(2000), Ross et al.(2000). The results are compared to the results of several clustering methods. Modified graph which combine partitioning method and biplot is also suggested.