• Title/Summary/Keyword: 계층적 군집화

Search Result 134, Processing Time 0.025 seconds

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls (군집분석 비교 및 한우 관능평가데이터 군집화)

  • Kim, Jae-Hee;Ko, Yoon-Sil
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.745-758
    • /
    • 2009
  • Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

A Fusion of the Period Characterized and Hierarchical Bayesian Techniques for Efficient Cluster Analysis of Time Series Data (시계열자료의 효율적 군집분석을 위한 구간특징화와 계층적 베이지안 기법의 융합)

  • Jung, Young-Ae;Jeon, Jin-Ho
    • Journal of Digital Convergence
    • /
    • v.13 no.7
    • /
    • pp.169-175
    • /
    • 2015
  • An effective way to understand the dynamic and time series that follows the passage of time, as valuation is to establish a model to analyze the phenomena of the system. Model of the decision process is efficient clustering information of the total mass of the time series data of the relevant population been collected in a particular number of sub-groups than to look at all a time to an understand of the overall data through each community-specific model determination. In this study, a sub-grouping of the group and the first of the two process model of each cluster by determining, in the following in sub-population characterized by a fusion with heuristic Bayesian clustering techniques proposed a process which can reduce calculation time and cost was confirmed by experiments using actual effectiveness valuation.

A change of the public's emotion depending on Temperature & Humidity index (온습도에 따른 대중의 감성(감정+감각) 활동 변화)

  • Yang, Junggi;Kim, Geunyoung;Lee, Youngho;Kang, Un-Gu
    • Journal of Digital Convergence
    • /
    • v.12 no.10
    • /
    • pp.243-252
    • /
    • 2014
  • Many researches about the effect on politics, economics and Sociocultural phenomenon using the social media are in progress. Authors utilized NAVER Trend most famous web browsing service in korea, NAVER Blog social media, NAVER Cafe service and Open Data(API) and also used temperature, humidity index data of Korea Meteorological Administration. This study analyzed a change of the public's emotion in korea using Cluster analysis of vocabulary of taste among its of feelings and senses. K-means clustering was followed by decision of the number of groups which was used Chi-square goodness of fit test and ward analysis. Eight groups was made and it represented sensitive vocabulary. By Discriminant analysis, eight groups decided by Cluster analysis has 98.9% accuracy. The change of the public's emotion has capability to predict people's activity so they can share sensibility and a bond of sympathy developed between them.

The Comparison of Neural Network and k-NN Algorithm for News Article Classification (신경망 또는 k-NN에 의한 신문 기사 분류와 그의 성능 비교)

  • 조태호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10c
    • /
    • pp.363-365
    • /
    • 1998
  • 텍스트 마이닝(Text Mining)이란 텍스트형태의 문서들의 패턴 또는 관계를 추출하여 사용자가 원하는 새로운 정보를 가공하거나 기존의 정보를 변형하는 과정을 말한다. 텍스트 마이닝의 기능에는 문서 범주화(Document Categorization), 문서 군집화(Document Clustering), 그리고 문서 요약(Document Summarization)이 이에 해당된다. 문서 범주화란 문서에게 사전에 정의한 범주를 부여하는 과정을 말하고, 문서 군집화란 문서들을 계층적 구조로 형성하는 과정을 말하고, 문서 요약이란 문서의 전체 내용을 대표할 수 있는 내용의 일부만을 추출하는 과정을 말한다. 이 논문에서는 문서 범주화만을 다룰 것이며 그 대상으로는 신문기사로 설정하였다. 그의 범주는 4가지로 정치, 경제, 스포츠, 그리고 정보통신으로 설정하였다. 문서 범주화는 문서 분류(Document Classification)라고도 하며 문서에 범주를 자동으로 부여하여 기존에 인위적으로 부여함으로써 소요되는 시간과 비용을 절감하는 것이 목적이다. 문서 범주화에 대하여 k-NN(k-Nearest Neighbor)와 신경망을 이용하였으며, 신경망을 이용한 경우가 k-NN을 이용한 경우보다 성능이 우수하였다.

  • PDF

Double K-Means Clustering (이중 K-평균 군집화)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.343-352
    • /
    • 2000
  • In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

  • PDF

Semantic Clustering of Predicates using Word Definition in Dictionary (사전 뜻풀이를 이용한 용언 의미 군집화)

  • Bae, Young-Jun;Choe, Ho-Seop;Song, Yoo-Hwa;Ock, Cheol-Young
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.3
    • /
    • pp.271-298
    • /
    • 2011
  • The lexical semantic system should be built to grasp lexical semantic information more clearly. In this paper, we studied a semantic clustering of predicates that is one of the steps in building the lexical semantic system. Unlike previous studies that used argument of subcategorization(subject and object), selectional restrictions and interaction information of adverb, we used sense tagged definition in dictionary for the semantic clustering of predicate, and also attempted hierarchical clustering of predicate using the relationship between the generic concept and the specific concept. Most of the predicates in the dictionary were used for clustering. Total of 106,501 predicates(85,754 verbs, 20,747 adjectives) were used for the test. We got results of clustering which is 2,748 clusters of predicate and 130 recursive definition clusters and 261 sub-clusters. The maximum depth of cluster was 16 depth. We compared results of clustering with the Sejong semantic classes for evaluation. The results showed 70.14% of the cohesion.

  • PDF

Initial Seed Generation for Constrained K-means (제약된 K-means를 위한 초기 씨드 생성방법)

  • Seo, Hyang-Suk;Kang, Jae-Ho;Ryu, Kwang-Ryel
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11a
    • /
    • pp.283-286
    • /
    • 2003
  • 군집화 시 일반적으로 개별 클래스(class) 혹은 카테고리(category) 당 하나의 군집이 형성되는 결과가 선호된다. 하지만 데이터가 비정형적인 분포를 따르는 경우에는 하나의 군집으로 개별 클래스를 온전히 표현하는 것이 불가능하거나 오히려 부자연스러운 경우가 발생할 수 있다. 본 논문에서는 예제의 클래스를 알고 있는 즉, 레이블(label)된 예제들을 그렇지 않은(unlabeled) 예제들과 함께 활용하여 군집화하는 제약된 K-means (constrained K-means) 알고리즘을 위하여 보다 자연스러운 형태의 군집이 형성될 수 있도록 초기 씨드(seed, 씨앗)를 생성하는 방안을 제안한다. 레이블된 예제들을 계층적으로 군집화하면 다양한 단계에서 제약된 K-means를 위한 씨드집합을 생성할 수 있다. 본 연구에서는 각 단계의 씨드집합을 기반으로 형성된 군집결과간의 변화정도를 측정하여 가장 적절한 것으로 추정되는 씨드집합을 선정하였다. 제안한 방안을 문서 군집화 문제에 적용하여 실험한 결과 개별 클래스마다 하나의 군집을 가정하는 경우보다 더 나은 군집을 형성할 수 있음을 확인하였다.

  • PDF

Hierarchical Clustering Methodology for Source Code Plagiarism Detection (계층적 군집화 기법을 이용한 소스 코드 표절 검사)

  • Sohn, Ki-Rack;Moon, Seung-Mi
    • Journal of The Korean Association of Information Education
    • /
    • v.11 no.1
    • /
    • pp.91-98
    • /
    • 2007
  • Plagiarism is a serious problem in school education due to current technologies such as the internet and word processors. This paper presents how to detect source code plagiarism using similarity based on string comparison methods. The main contribution is to use hierarchical agglomerative clustering technique to classify plagiarism groups, which are then visualized as a dendrogram. Graders can set an empirical threshold to the dendrogram to navigate plagiarism groups. We evaluated the performance of the presented method with a real world data. The result showed the usefulness and applicability of this method.

  • PDF

Word Sense Disambiguation Using Korean Word Definition Vectors (한국어 단어 정의 벡터를 이용한 단어 의미 모호성 해소)

  • Park, Jeong Yeon;Lee, Jae Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.195-198
    • /
    • 2021
  • 기존 연구에 따르면, 시소러스의 계층적 관계를 기반으로 압축한 의미 어휘 태그를 단어 의미 모호성 해소에 사용할 경우, 그 성능이 향상되었다. 본 논문에서는 시소러스를 사용하지 않고, 국어 사전에 포함된 단어의 의미 정의를 군집화하여 압축된 의미 어휘 태그를 만드는 방법을 제안한다. 또, 이를 이용하여 효율적으로 단어 의미 모호성을 해소하는 BERT 기반의 딥러닝 모델을 제안한다. 한국어 세종 의미 부착 말뭉치로 실험한 결과, 제안한 방법의 성능이 F1 97.21%로 기존 방법의 성능 F1 95.58%보다 1.63%p 향상되었다.

  • PDF

Pattern Clustering of Symmetric Regional Cerebral Edema on Brain MRI in Patients with Hepatic Encephalopathy (간성뇌증 환자의 뇌 자기공명영상에서 대칭적인 지역 뇌부종 양상의 군집화)

  • Chun Geun Lim;Hui Joong Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.85 no.2
    • /
    • pp.381-393
    • /
    • 2024
  • Purpose Metabolic abnormalities in hepatic encephalopathy (HE) cause brain edema or demyelinating disease, resulting in symmetric regional cerebral edema (SRCE) on MRI. This study aimed to investigate the usefulness of the clustering analysis of SRCE in predicting the development of brain failure. Materials and Methods MR findings and clinical data of 98 consecutive patients with HE were retrospectively analyzed. The correlation between the 12 regions of SRCE was calculated using the phi (φ) coefficient, and the pattern was classified using hierarchical clustering using the φ2 distance measure and Ward's method. The classified patterns of SRCE were correlated with clinical parameters such as the model for end-stage liver disease (MELD) score and HE grade. Results Significant associations were found between 22 pairs of regions of interest, including the red nucleus and corpus callosum (φ = 0.81, p < 0.001), crus cerebri and red nucleus (φ = 0.72, p < 0.001), and red nucleus and dentate nucleus (φ = 0.66, p < 0.001). After hierarchical clustering, 24 cases were classified into Group I, 35 into Group II, and 39 into Group III. Group III had a higher MELD score (p = 0.04) and HE grade (p = 0.002) than Group I. Conclusion Our study demonstrates that the SRCE patterns can be useful in predicting hepatic preservation and the occurrence of cerebral failure in HE.