• Title/Summary/Keyword: 군집성

Search Result 2,875, Processing Time 0.029 seconds

Pairwise fusion approach to cluster analysis with applications to movie data (영화 데이터를 위한 쌍별 규합 접근방식의 군집화 기법)

  • Kim, Hui Jin;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.265-283
    • /
    • 2022
  • MovieLens data consists of recorded movie evaluations that was often used to measure the evaluation score in the recommendation system research field. In this paper, we provide additional information obtained by clustering user-specific genre preference information through movie evaluation data and movie genre data. Because the number of movie ratings per user is very low compared to the total number of movies, the missing rate in this data is very high. For this reason, there are limitations in applying the existing clustering methods. In this paper, we propose a convex clustering-based method using the pairwise fused penalty motivated by the analysis of MovieLens data. In particular, the proposed clustering method execute missing imputation, and at the same time uses movie evaluation and genre weights for each movie to cluster genre preference information possessed by each individual. We compute the proposed optimization using alternating direction method of multipliers algorithm. It is shown that the proposed clustering method is less sensitive to noise and outliers than the existing method through simulation and MovieLens data application.

The Structure of Plant Community in Jungdaesa-Birobong Area, Odaesan National Park (오대산국립공원 중대사-비로봉 구간 식물군집구조)

  • Han, Bong-ho;Choi, Jin-woo;Noh, Tai-hwan;Kim, Dong-wook
    • Korean Journal of Environment and Ecology
    • /
    • v.29 no.5
    • /
    • pp.764-776
    • /
    • 2015
  • This study aims to identify the structure of the plant community, and the ecological succession sere and the change in the forest ecosystem in Jungdaesa-Birobong area, Odaesan National Park_(i._e., located at high altitudes(over 1,000m)). It seeks to offer the basic data for the planning of vegetation management. In order to verify the status of the forest vegetation between Jungdaesa-Birobong, seventeen plots(size is $20m{\times}20m$) were set up as research sites at high altitudes. Importance value, distribution by diameter at breast height(DBH), the growth volume and age of the sample trees, similarity index and species diversity index of each survey plot were analysed. According to the results of DCA(Detrended Correspondence Analysis), one of the multivariate statistical techniques. It was found that the plant communities were classified into five groups: community I_(Quercus mongolica-Tilia amurensis community), community II_(Q. mongolica-Deciduous broad-leaved community), community III_(Q. mongolica-Pinus koraiensis community), community IV_(Abies holophylla-Q. mongolica community) and community V_(A. holophylla-Deciduous broad-leaved community). Community I which is dominated by Quercus mongolica and Deciduous broad-leaved communities is located at an altitude of over 1,300 meters(ranging from 1,335m to 1,495m), the community IV and V which are dominated by Abies holophylla are located at an altitude of under 1,200 meters(ranging from 1,115m to 1,175m) and the community II and III which include the main species of Quercus mongolica, Pinus koraiensis and Abies holophylla are located at an altitude of between 1,160 meters and 1,300 meters. The results showed that Quercus mongolica tends to have a higher importance value of woody species at a higher altitude while Abies holophylla tends to have higher importance value at a lower altitude. For the importance value woody species and -DBH class distribution, the communites I, II and III are expected to continuously maintain the present status. Whereas, for the influence of communities IV and V, Q. mongolica is predicted to be weakened. The age of sample trees was between 85 and 161; the average age was 123. The index of Shannon's Species diversity (H') showed heterogeneity was found among community I_(i._e., located at high altitude) and communities IV and V_(i._e., located at low altitude). As a results of analysing the index of Shannon's Species diversity (H': unit: $400m^2$), community III showed the highest diversity intex with 1.1109 followed by community II with 1.0475, community I with 1.0125, community IV with 0.9918 and community V with 0.8686. This study verified that the index of Shannon's species was significantly different by plant communities. For instance, when comparing the index of Shannon's species diversity in Quercus mongolica communities of this study and that of past relevant research, the value of index is very similar. However, the diversity index for the community which is dominated by Abies holophylla showed lower value when compared to the results from past relevant research.

Regionalization of Extreme Rainfall with Spatio-Temporal Pattern (극치강수량의 시공간적 특성을 이용한 지역빈도분석)

  • Lee, Jeong-Ju;Kwon, Hyun-Han;Kim, Byung-Sik;Yoon, Seok-Yeong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.1429-1433
    • /
    • 2010
  • 수공구조물의 설계, 수자원 관리계획의 수립, 재해영향 검토 등을 수행할 때, 재현기간에 따른 확률개념의 강우량, 홍수량, 저수량 등을 산정하여 사용하게 되며, 보통 대상지역의 장기 수문관측 자료를 이용하여 수문사상의 확률분포를 산정한 후 재현기간을 연장하여 원하는 설계빈도에 해당하는 양을 추정하게 된다. 미계측지역 또는 관측자료의 보유기간이 짧은 지역의 경우는 지역빈도 분석 결과를 이용하게 된다. 지역빈도해석을 위해서는 강우자료들의 동질성을 파악하는 것이 가장 기본적인 과정이 되며 이를 위해 통계학적인 범주화분석이 선행되어야 한다. 지점 빈도분석의 수문학적 동질성 판별을 위해 L-moment 방법, K-means 방법에 의한 군집분석 등이 주로 사용되며 관측소 위치좌표를 이용한 공간보간법을 적용하여 시각화하고 있다. 강수량은 시공간적으로 변하는 수문변량으로서 강수량의 시간적인 특성 또한 강수량의 특성을 정의하는데 매우 중요한 요소이다. 이러한 점에서 본 연구를 통해 강수지점의 공간적인 좌표 및 강수량의 양적인 범주화에 초점을 맞춘 기존 지역빈도분석의 범주화 과정에 덧붙여 시간적인 영향을 고려할 수 있는 요소들을 결정하고 이를 활용할 수 있는 범주화 과정을 제시하고자 한다. 즉, 극치강수량의 발생 시기에 대한 정량적인 분석이 가능한 순환통계기법을 이용하여 관측 지점별 시간 통계량을 산정하고, 이를 극치강수량과 결합하여 시 공간적인 특성자료를 생성한 후 이를 이용한 군집화 해석 모형을 개발하는데 연구의 목적이 있다. 분석 과정에 있어서 시간속성의 정량화 및 일반화는 순환통계기법을 사용하였으며, 극치강수량과 발생시점의 속성자료는 각각의 평균과 표준편차를 이용하였다. K-means 알고리즘을 이용해 결합자료를 군집화 하고, L-moment 방법으로 지역화 결과에 대한 검증을 수행하였다. 속성 결합 자료의 군집화 효과는 모의데이터 실험을 통해 확인하였으며, 우리 나라의 58개 기상관측소 자료를 이용하여 분석을 수행하였다. 예비해석 단계에서 100회의 군집분석을 통해 평균적인 centroid를 산정하고, 해당 값을 본 해석의 초기 centroid로 지정하여, 변동적인 클러스터링 경향을 안정화시켜 해석이 반복됨에 따라 군집화 결과가 달라지는 오류를 방지하였다. 또한 K-means 방법으로 계산된 군집별 공간거리 합의 크기에 따라 군집번호를 부여함으로써 군집의 번호순서대로 물리적인 연관성이 인접하도록 설정하였으며, 군집간의 경계선을 추출할 때 발생할 수 있는 오류를 방지하였다. 지역빈도분석 결과는 3차원 Spline 기법으로 도시하였다.

  • PDF

인위적 데이터를 이용한 군집분석 프로그램간의 비교에 대한 연구

  • 김성호;백승익
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.2
    • /
    • pp.35-49
    • /
    • 2001
  • Over the years, cluster analysis has become a popular tool for marketing and segmentation researchers. There are various methods for cluster analysis. Among them, K-means partitioning cluster analysis is the most popular segmentation method. However, because the cluster analysis is very sensitive to the initial configurations of the data set at hand, it becomes an important issue to select an appropriate starting configuration that is comparable with the clustering of the whole data so as to improve the reliability of the clustering results. Many programs for K-mean cluster analysis employ various methods to choose the initial seeds and compute the centroids of clusters. In this paper, we suggest a methodology to evaluate various clustering programs. Furthermore, to explore the usability of the methodology, we evaluate four clustering programs by using the methodology.

  • PDF

Impacts of Automated Vehicle Platoons on Car-following Behavior of Manually-Driven Vehicles (군집주행 환경이 비자율차량의 차량 추종에 미치는 영향분석)

  • Suh, Sanghyuk;Lee, Seolyoung;Oh, Cheol;Choi, Saerona
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.4
    • /
    • pp.107-121
    • /
    • 2017
  • This study conducted a 3-stage survey and simulation experiment to identify the impact of vehicle platoons on car-following behavior of manually-driven vehicles. Vehicle maneuvering data obtained from driving simulations was statistically analyzed based on three measures including average speed, acceleration noise, and offset to represent the deviation of lateral movements. Results indicate that MV drivers tended to have psychological burden while driving in automated vehicle platooning environments, which resulted in different vehicle maneuvers. It is expected that the outcome of this study would be useful fundamentals in developing various traffic operations strategies for managing mixed traffic stream consisting of MVs and autonomous vehicles.

Cause-specific Spatial Point Pattern Analysis of Forest Fire in Korea (우리나라 산불 발생의 원인별 공간적 특성 분석)

  • Kwak, Han-Bin;Lee, Woo-Kyun;Lee, Si-Young;Won, Myung-Soo;Koo, Kyo-Sang;Lee, Byung-Doo;Lee, Myung-Bo
    • Journal of Korean Society of Forest Science
    • /
    • v.99 no.3
    • /
    • pp.259-266
    • /
    • 2010
  • Forest fire occurrence in Korea is highly related to human activities and its spatial distribution shows a strong spatial dependency with cluster pattern. In this study, we analyzed spatial distribution pattern of forest fire with point pattern analysis considering spatial dependency. Distributional pattern was derived from Ripley's K-function according to causes and distances. Spatially clustered intensity was found out using Kernel intensity estimation. As a result, forest fires in Korea show clustered pattern, although the degrees of clustering for each cause are different. Furthermore, spatial clustering pattern can be classified into two groups in terms of degrees of clustering and distance. The first group shows the national-wide cluster pattern related to the human activity near forests, such as human-induced accidental fire in mountain and field incineration. Another group shows localized cluster pattern which is clustered within a short distance. It is associated with the smoker fire, arson, accidental by children. The range of localized clustering was 30 km. Beyond of this range, the patterns of forest fire became random distribution gradually. Kernel intensity analysis showed that the latter group, which have localized cluster pattern, was occurred in near Seoul with high densed population.

A study on the ordering of similarity measures with negative matches (음의 일치 빈도를 고려한 유사성 측도의 대소 관계 규명에 관한 연구)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.89-99
    • /
    • 2015
  • The World Economic Forum and the Korean Ministry of Knowledge Economy have selected big data as one of the top 10 in core information technology. The key of big data is to analyze effectively the properties that do have data. Clustering analysis method of big data techniques is a method of assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. Similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we studied upper and lower bounds for binary similarity measures with negative matches such as Russel and Rao measure, simple matching measure by Sokal and Michener, Rogers and Tanimoto measure, Sokal and Sneath measure, Hamann measure, and Baroni-Urbani and Buser mesures I, II. And the comparative studies with these measures were shown by real data and simulated experiment.

이동 로봇의 군집 제어 리뷰

  • Park, Bong-Seok;Kim, Hong-Geun
    • ICROS
    • /
    • v.19 no.2
    • /
    • pp.34-38
    • /
    • 2013
  • 자연계에서 빈번히 목격되는 군집 현상과 그 효용성의 고찰에 기인하여, 최근 다중 이동 로봇의 협업에 대한 연구가 활발히 수행되고 있다. 그 중, 본 논문에서는 다중 이동 로봇의 군집 제어 방법론들을 설명하고, 그와 관련된 최신 결과들도 소개한다. 특히 군집 제어 문제를 해결하기 위한 대표적인 방식인 행동 기반 접근법, 가상 구조 접근법, 선도-추종 접근법, 그래프 이론 기반 접근법 위주로 소개한다.

Vegetation Structure of Mountain Ridge from Suryeong to Sosagogae in Baekdudaegan, Korea (백두대간 수령-소사고개 구간의 식생구조)

  • 추갑철;김갑태
    • Korean Journal of Environment and Ecology
    • /
    • v.18 no.2
    • /
    • pp.150-157
    • /
    • 2004
  • To investigate the vegetation structure of mountain ridge from Suryeong to Sosagogae, 10 plots(500$m^2$) set up with random sampling method were surveyed. Three groups, Quercus dentata-Fraxinus rhynchophylla community, Quercus mongolica-Fraxinus rhynchophylla community, Quercus mongolica community, were classified by cluster analysis. Quercus mongolica was found as a major woody plant species in the ridge area from Suryeong to Sosagogae. Quercus dentata and Fraxinus rhynchophylla were occupied partly in lower elevation. Species diversity(H') of investigated groups was ranged from 1.7295∼2.6525 and it was similar to that of the ridge area of the national parks in Baekdudaegan. Rare and endangered species, Rhododendron tschonoskii recorded from the list of the Forest Administration distributed between the rocks on the top of the Sambongsan, the long-term habitat monitoring might be required.

Vegetation Structure of Mountain Ridge from Nogodan to Goribong in Baekdudaegan, Korea (백두대간 노고단-고리봉 구간의 식생구조)

  • 김갑태;추갑철
    • Korean Journal of Environment and Ecology
    • /
    • v.16 no.4
    • /
    • pp.441-448
    • /
    • 2003
  • To investigate the vegetation structure of mountain ridge from Nogodan to Goribong, 22 plots(500$m^2$) set up with random sampling method were surveyed Three groups Quercus mongolica-Fraxinus rhynchophylla community. Quercus mongolica- Pinus densinora community. Quercus mongolica community were classified by cluster analysis. Quercus mongolica was found as it major woody plant species in the ridge area from Nogodan to Goribong. And partly in lower elevation was occupied by deciduous broadleaved tree species and Pinus densiflora. In this area, Korean endemic species, Abies koreana was distributed small amounts. Species diversity(H') of investigated group were ranged 0.9274~1.2845 and it was similar to those of the ridgee area of the national parks in Baekdudaegan.