• Title/Summary/Keyword: 상대적 계층적 군집

Search Result 14, Processing Time 0.023 seconds

Microarray data analysis using relative hierarchical clustering (상대적 계층적 군집 방법을 이용한 마이크로어레이 자료의 군집분석)

  • Woo, Sook Young;Lee, Jae Won;Jhun, Myoungshic
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.999-1009
    • /
    • 2014
  • Hierarchical clustering analysis helps easily exploring massive microarray data and understanding biological phenomena with dendrogram. But, because hierarchical clustering algorithms only consider the absolute similarity, it is difficult to illustrate a relative dissimilarity, which consider not only the distance between a pair of clusters, but also how distant are they from the rest of the clusters. In this study, we introduced the relative hierarchical clustering method proposed by Mollineda and Vidal (2000) and compared hierarchical clustering method and relative hierarchical method using the simulated data and the real data in the various situations. The evaluation of the quality of two hierarchical methods was performed using percentage of incorrectly grouped points (PIGP), homogeneity and separation.

A Comparative Study on the Agglomerative and Divisive Methods for Hierarchical Document Clustering (계층적 문서 클러스터링을 위한 응집식 기법과 분할식 기법의 비교 연구)

  • Lee, Jae-Yun;Jeong, Jin-Ah
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2005.08a
    • /
    • pp.65-70
    • /
    • 2005
  • 계층적 문서 클러스터링에 있어서 실험집단에 따라 응집식 기법과 분할식 기법의 성능이 다르며, 이를 좌우하는 요소는 분류의 깊이, 즉 분류수준이라고 가정하였다. 조금만 나누면 되는 대분류인 경우는 상대적으로 분할식 기법이 유리하고, 조금만 합치면 되는 소분류인 경우에는 응집식 기법이 유리할 것이라고 판단했기 때문이다. 그에 따라 분할식 클러스터링 기법인 양분(Bisecting) K-means기법과 응집식 기법인 완전연결, 평균연결, WARD기법의 성능을 실험집단이 대분류인 경우와 소분류인 경우의 유사계수를 적용하여 각 기법별 성능을 비교하여 실험집단의 특성에 따른 적합 클러스터링 기법을 찾고자 하였다. 실험결과 응집식 기법과 분할식 기법의 성능 우열에 영향을 미치는 것은 분류수준보다는 변이계수로 측정된 상대적인 군집의 크기 편차인 것으로 나타났다.

  • PDF

A Study on the Site Selection of Public Libraries Using Analytic Hierarchy Process Technique and Geographic Information System (계층분석법과 지리정보시스템을 이용한 공공도서관 입지선정에 관한 연구)

  • Park, Sung-Jae;Lee, Jee-Yeon
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.1 s.55
    • /
    • pp.65-85
    • /
    • 2005
  • This study proposes a new site selection model which reflects integrated opinions of several groups and identifies sites through objectivity of selection procedure. The proposed model consists of two parts, Analytic Hierarchy Process(AHP) and Geographic Information(GIS). This model was applied to Seocho-gu in Seoul. First, library site selection criteria were determined through literature study. Hierarchical relationship based on the questionnaire was determined and refined to be suited to Seocho-gu case. A survey was conducted with three groups, namely, library users, librarians, and public worker. A few inconsistent answers to the survey questionnaire were excluded and the relative importance of each criterion was measured. Next, an overlay method was used and the relative importance was used as a weight for selecting candidates. This process excluded the areas where a library was unable to be built, for example, rivers, military areas, other restricted areas by law, etc. and resulted in seventy-five sites. Five groups of candidates were identified according to the similarity of criteria. Finally, four groups, after eliminating one lowly fitted group, were determined.

A Study of Computational Literature Analysis based Classification for a Pairwise Comparison by Contents Similarity in a section of Tokkijeon, 'Fish Tribe Conference' (컴퓨터 문헌 분석 기반의 토끼전 '어족회의' 대목 내용 유사도에 따른 이본 계통 분류 연구)

  • Kim, Dong-Keon;Jeong, Hwa-Young
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.15-25
    • /
    • 2022
  • This study aims to identify the family and lineage of a part of a "Fish Tribe Conference" in the section Tokkijeon by utilizing computer literature analysis techniques. First of all, we encode the classification for a pairwise comparison's type of each paragraph to build a corpus, and based on this, we use the Hamming distance to calculate the distance matrix between each classification for a pairwise comparison's. We visualized classification for a pairwise comparison's clustering pattern by applying multidimensional scale method, and hierarchical clustering to explore the characteristics of the 'fish family' line and lineage compared to the existing cluster analysis study on entire paragraphs of "Tokkijeon". As a result, unlike the cluster analysis of the entire paragraph of "Tokkijeon", which consists of six categories, the "Fish Tribe Conference" section has five categories and some classification for a pairwise comparison's accesses. The results of this study are that the relative distance between Yibon was measured and systematic classification was performed in an objective and empirical way by calculation, and the characteristics of the line of the fish family were revealed compared to the analysis of the entire rabbit exhibition.

Effective Classification Method of Hierarchical CNN for Multi-Class Outlier Detection (다중 클래스 이상치 탐지를 위한 계층 CNN의 효과적인 클래스 분할 방법)

  • Kim, Jee-Hyun;Lee, Seyoung;Kim, Yerim;Ahn, Seo-Yeong;Park, Saerom
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.81-84
    • /
    • 2022
  • 제조 산업에서의 이상치 검출은 생산품의 품질과 운영비용을 절감하기 위한 중요한 요소로 최근 딥러닝을 사용하여 자동화되고 있다. 이상치 검출을 위한 딥러닝 기법에는 CNN이 있으며, CNN을 계층적으로 구성할 경우 단일 CNN 모델에 비해 상대적으로 성능의 향상을 보일 수 있다는 것이 많은 선행 연구에서 나타났다. 이에 MVTec-AD 데이터셋을 이용하여 계층 CNN이 다중 클래스 이상치 판별 문제에 대해 효과적인지를 탐구하고자 하였다. 실험 결과 단일 CNN의 정확도는 0.7715, 계층 CNN의 정확도는 0.7838로 다중 클래스 이상치 판별 문제에 있어 계층 CNN 방식 접근이 다중 클래스 이상치 탐지 문제에서 알고리즘의 성능을 향상할 수 있음을 확인할 수 있었다. 계층 CNN은 모델과 파라미터의 개수와 리소스의 사용이 단일 CNN에 비하여 기하급수적으로 증가한다는 단점이 존재한다. 이에 계층 CNN의 장점을 유지하며 사용 리소스를 절약하고자 하였고 K-means, GMM, 계층적 클러스터링 알고리즘을 통해 제작한 새로운 클래스를 이용해 계층 CNN을 구성하여 각각 정확도 0.7930, 0.7891, 0.7936의 결과를 얻을 수 있었다. 이를 통해 Clustering 알고리즘을 사용하여 적절히 물체를 분류할 경우 물체에 따른 개별 상태 판단 모델을 제작하는 것과 비슷하거나 더 좋은 성능을 내며 리소스 사용을 줄일 수 있음을 확인할 수 있었다.

  • PDF

Collision Risk Assessment by using Hierarchical Clustering Method and Real-time Data (계층 클러스터링과 실시간 데이터를 이용한 충돌위험평가)

  • Vu, Dang-Thai;Jeong, Jae-Yong
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.4
    • /
    • pp.483-491
    • /
    • 2021
  • The identification of regional collision risks in water areas is significant for the safety of navigation. This paper introduces a new method of collision risk assessment that incorporates a clustering method based on the distance factor - hierarchical clustering - and uses real-time data in case of several surrounding vessels, group methodology and preliminary assessment to classify vessels and evaluate the basis of collision risk evaluation (called HCAAP processing). The vessels are clustered using the hierarchical program to obtain clusters of encounter vessels and are combined with the preliminary assessment to filter relatively safe vessels. Subsequently, the distance at the closest point of approach (DCPA) and time to the closest point of approach (TCPA) between encounter vessels within each cluster are calculated to obtain the relation and comparison with the collision risk index (CRI). The mathematical relationship of CRI for each cluster of encounter vessels with DCPA and TCPA is constructed using a negative exponential function. Operators can easily evaluate the safety of all vessels navigating in the defined area using the calculated CRI. Therefore, this framework can improve the safety and security of vessel traffic transportation and reduce the loss of life and property. To illustrate the effectiveness of the framework proposed, an experimental case study was conducted within the coastal waters of Mokpo, Korea. The results demonstrated that the framework was effective and efficient in detecting and ranking collision risk indexes between encounter vessels within each cluster, which allowed an automatic risk prioritization of encounter vessels for further investigation by operators.

An Efficiency Analysis of Industry-University-Public Research Institute Collaborative Research: Employing the Input-Output Itemization Model (투입 및 산출 분해모형을 활용한 산학연 협력연구의 효율성 분석)

  • Kim, Hong-Young;Chung, Sunyang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.12
    • /
    • pp.473-484
    • /
    • 2017
  • This study analyzed collaborative R&D projects funded by the Korean government from 2013-2015. For this analysis, input and output variables of projects were considered, and a combination of those variables was itemized. The output-oriented variable return to scale (VRS) model extended from the DEA methodology was adopted to evaluate the cooperation efficiency of the types of R&D collaboration, which were classified according to the project leader's organizations. In addition, hierarchical cluster analysis was conducted using the efficiency results of the scientific, technical, and economical outcome models. The results showed that cooperation efficiency between large companies and public research institutions was relatively high. Conversely, cooperation among medium-sized companies, small businesses and universities was particularly inefficient. The clustering results demonstrated the various strengths and weaknesses of the types depending on publications, patents, technical loyalties and the number of commercialization. In conclusion, this study suggests differentiated investment portfolios and strategies based on the efficiency results of diverse cooperation types among industries, universities and public research institutions.

A Study on the Implementation of Walking Environment Projects by Analyzing Characteristics of Pedestrian Accidents by Local Government Types (지방자치단체의 유형별 보행자사고 특성분석 및 보행환경조성사업 개선방안 연구)

  • Park, Jinkyung;Han, Myungjoo
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.6
    • /
    • pp.615-627
    • /
    • 2014
  • In this study, nonhierarchical K-mean cluster analysis is used to classify the types of 230 local governments and the Mann-Whitney U test and Kruskal-Wallis analysis are used to analyze the characteristics of pedestrian accidents by region types. With empirical analysis of pedestrian accidents, this study suggests improvements of walking environments reflecting local characteristics. Type 1-A (relatively dominant urban commercial areas), Type 1-B (predominantly urban residence) and Type 2 (rural areas) have been classified using nonhierarchical K-mean cluster analysis. According to the results, pedestrian accident rate on community roads was more than 60% for all types and incidence rate in rural areas was higher than that in urban areas. In addition, pedestrian accidents of Type 1-B have been found to occur more frequently than Type 2 in intersections and crossings, while the number of roadside casualties for Type 2 was highest.

Temple Forest Vegetation Structure of Cultural Heritage Site in Mt. Gyeryongsan National Park - Focused on Donghaksa, Gapsa and Sinwonsa - (계룡산국립공원 공원문화유산지구 사찰림의 식생구조)

  • Song, Ju-Hyeon;Kwon, Soon-Sun;Kim, Ho-Jin;Lee, Jeong-Eun;Yun, I-Seul;Siswo, Siswo;Kim, Hyun-Seop;Yun, Chung-Weon
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.6
    • /
    • pp.722-733
    • /
    • 2019
  • This study was carried out to provide basic information for the ecological preservation management of temple forest (Donghaksa, Gapsa, Sinwonsa) by investigating the ecological characteristics of vegetation structure of the Cultural Heritage Site in Mt. Gyeryongsan National Park based on the Braun-Blanquet vegetation survey method from September 2018 to May 2019. As a result of hierarchical cluster analysis, the forest vegetation was classified into 3 vegetation units (Zelkova serrata - Akebia quinata - Kerria japonica community, VU1; Quercus serrata - Callicarpa japonica - Carpinus cordata community, VU2; and Pinus densiflora - Prunus sargentii - Fraxinus sieboldiana community, VU3). The indicator species of each vegetation unit were 12 taxa, 8 taxa, and 6 taxa, respectively. The result of the importance value analysis showed that Z. serrata had the highest importance value in all vegetation units, and the result of the species diversity analysis showed that the species diversity of VU3 was 0.939, which was relatively higher than other vegetation units. The result of the CCA of correlation between vegetation units and abiotic environmental factors showed that VU2 had a negative correlation with altitude, and biotic environmental factors had no significant correlation with vegetation units.

Monitoring of Groundwater quality according to groundwater use for agriculture (농업용 지하수 사용에 따른 지하수질 모니터링 평가)

  • Ha, Kyoochul;Ko, Kyung-Seok;Lee, Eunhee;Kim, Sunghyun;Park, Changhui;Kim, Gyoo-Bum
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.30-30
    • /
    • 2020
  • 본 연구에서는 여름철에 농업용수(벼농사용)로서 집중적으로 지하수를 사용하는 지역에서 시기별 지하수 사용에 따른 지하수 수질변화를 평가하기 위해 수행되었다. 연구지역은 충남 홍성군 양곡리와 신곡리 일부를 포함하는 면적 2.83 ㎢(283.3 ha)에 해당하는 지역이다. 연구지역 지하수 수질의 공간적 분포 및 시간적 변화 특성 평가를 위하여 2019년 2회(7월, 10월)에 걸쳐 지하수 관정(21개소)에 대하여 조사 및 분석을 수행하였다. 지하수 샘플은 현장에서 온도(T), pH, 용존산소(DO) 및 전기전도도(EC), 산화환원전위(Eh) 등을 측정하였고, 실험실에서 주요 양이온 및 미량원소(Ca, Mg, Na, K, Si, Sr), 음이온(F, Cl, Br, NO2, NO3, PO4, SO4), 알칼리도, 용존 유기탄소(DOC)와 용존 유기물(DOM) 등을 분석하였다. 지하수 수질조사 결과, 전체의 14~15개소(67~71%)가 Ca-HCO3 유형으로 분류되었으며, 다음으로는 Ca-Cl 유형이 4~5개소(19~24%)가 관찰되었다. 얕은 심도의 관정에서 상대적으로 심도가 깊은 관정보다 대부분 성분(TDS, Ca, Mg, Na, K, Cl, SO4, HCO3, DOC)에서 높은 농도를 나타내었다. 지하수의 수질자료를 이용하여 다변량통계분석법인 주성분분석(PCA: Principal Components Analysis)과 계층적 군집분석(HCA: Hierachical Cluster Anlaysis)를 수행한 결과, 초기 3개 주요 고유성분(eigenvalue)는 PC1 54.0%, PC2 14.2%, PC3 12.3%로 전체 분산의 88.3%를 설명할 수 있었다. PC1은 Ca, Mg, Na, K, Cl, SO4, DOC가 주요한 영향 인자였으며 PC2는 HCO3, NO3, DO에 영향 받음을 확인하였다. 계층적 군집분석 결과, 연구지역 지하수는 Na-Cl 유형의 C-3 관정을 제외하고는 크게 두 그룹으로 구분되어 졌다. 다변량통계분석의 결과에서도 수리지화학, 동위원소, 용존유기물 등의 특성에서 나타나는 것과 유사한 연구지역의 수질특성을 확인할 수 있었다. 연구지역은 차시기 동안 수질변화는 일부 관정을 제외하고는 유의할 만한 수준으로 관찰되지는 않았고, 지하수 사용에 따른 지하수위 회복도 빠르게 진행되고 있는 것으로 나타났다.

  • PDF