통합 검색 | Korea Science

On the clustering of huge categorical data

Kim, Dae-Hak
- Journal of the Korean Data and Information Science Society
- /
- 제21권6호
- /
- pp.1353-1359
- /
- 2010
Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.
PDF KSCI

계층적 군집분석(hierarchical clustering)을 통한 침구자생경(鍼灸資生經) 경혈 선택 요인 분석 (Deduction of Acupoints Selecting Elements on Zhenjiuzishengjing using hierarchical clustering)

오준호
- 혜화의학회지
- /
- 제23권1호
- /
- pp.115-124
- /
- 2014
Objectives : There are plenty of medical record of acupuncture & moxibustion in Traditional East Asian medicine(TEAM). We performed this study to find out the hidden criteria lies on this record to choose proper acupoints. Methods : "Zhenjiuzishengjing", ancient TEAM book was analysed using document clustering techniques. Corpus was made from this book. It contained 196 texts driven from each symptoms. Each texts converted to vector representing frequency of 349 acupoints. Distance of vectors calculated by weighted Euclidean distance method. According to this distances, hierarchical clustering of symptoms was builded. Results : The cluster consisted of five large groups. they had high corelation with body part; head and face, chest, abdomen, upper extremity, lower extremity, back. Conclusions : It assumes that body part of symptom is the most importance criteria of acupoints selecting. some high similar symptom vectors consolidated this result. the other criteria is cause and pathway of illness. some symptoms bound together which had common cause and pathway.
PDF KSCI

Hierarchical Clustering을 이용한 네트워크 패킷의 분류 (Classification of network packets using hierarchical clustering)

여인성;;황성운
- 사물인터넷융복합논문지
- /
- 제3권1호
- /
- pp.9-11
- /
- 2017
최근에 인터넷과 모바일 장치가 널리 보급되면서 해커들이 네트워크를 이용해 공격하는 횟수 또한 증가하고 있다. 네트워크를 연결할 때 패킷을 주고받으며 통신을 하게 되는데, 여기에는 다양한 정보가 포함되어 있다. 이 패킷들의 정보를 Hierarchical Clustering 분석을 사용해 분석하고 정상적인 패킷과 비정상적인 패킷을 분류하여 공격자들의 공격을 탐지하였다. 이 분석 방법을 통해 새로운 패킷을 분석하여 공격을 탐지하는 것이 가능할 것이다.
https://doi.org/10.20465/KIOTS.2017.3.1.009 인용 PDF

Results of Discriminant Analysis with Respect to Cluster Analyses Under Dimensional Reduction

Chae, Seong-San
- Communications for Statistical Applications and Methods
- /
- 제9권2호
- /
- pp.543-553
- /
- 2002
Principal component analysis is applied to reduce p-dimensions into q-dimensions ( $q {\leq} p$). Any partition of a collection of data points with p and q variables generated by the application of six hierarchical clustering methods is re-classified by discriminant analysis. From the application of discriminant analysis through each hierarchical clustering method, correct classification ratios are obtained. The results illustrate which method is more reasonable in exploratory data analysis.
https://doi.org/10.5351/CKSS.2002.9.2.543 인용 PDF KSCI

계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류 (Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering)

류중원;조성배
- 한국지능시스템학회:학술대회논문집
- /
- 한국퍼지및지능시스템학회 2001년도 춘계학술대회 학술발표 논문집
- /
- pp.187-190
- /
- 2001
Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.
PDF

데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬 (Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining)

황인수
- 경영과학
- /
- 제19권1호
- /
- pp.189-196
- /
- 2002
Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.
PDF KSCI

An Incremental Similarity Computation Method in Agglomerative Hierarchical Clustering

Jung, Sung-young;Kim, Taek-soo
- 한국지능시스템학회논문지
- /
- 제11권7호
- /
- pp.579-583
- /
- 2001
In the area of data clustering in high dimensional space, one of the difficulties is the time-consuming process for computing vector similarities. It becomes worse in the case of the agglomerative algorithm with the group-average link and mean centroid method, because the cluster similarity must be recomputed whenever the cluster center moves after the merging step. As a solution of this problem, we present an incremental method of similarity computation, which substitutes the scalar calculation for the time-consuming calculation of vector similarity with several measures such as the squared distance, inner product, cosine, and minimum variance. Experimental results show that it makes clustering speed significantly fast for very high dimensional data.
PDF

A Simple Tandem Method for Clustering of Multimodal Dataset

Cho C.;Lee J.W.;Lee J.W.
- 한국경영과학회:학술대회논문집
- /
- 한국경영과학회/대한산업공학회 2003년도 춘계공동학술대회
- /
- pp.729-733
- /
- 2003
The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.
PDF

계층적 클러스터링에서 분류 계층 깊이에 관한 연구 (A Study on Cluster Hierarchy Depth in Hierarchical Clustering)

김해남;이신원;안동언;정성종
- 한국정보처리학회:학술대회논문집
- /
- 한국정보처리학회 2004년도 춘계학술발표대회
- /
- pp.673-676
- /
- 2004
Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering provide a view of the data at different levels, making the large document collections are adapted to people's instinctive and interested requires. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simpleness, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system [10] with hierarchical structure based on document clustering using K-means algorithm to "get the best of both worlds". The performance of CONDOR system is compared with the VIVISIMO hierarchical clustering system [9], and performance is analyzed on feature words selection of specific topics and the optimum hierarchy depth.
PDF

계층적 불균형 클러스터링 기법을 이용한 에너지 소비 모델 (An Energy Consumption Model using Hierarchical Unequal Clustering Method)

김진수;신승수
- 한국산학기술학회논문지
- /
- 제12권6호
- /
- pp.2815-2822
- /
- 2011
무선 센서 네트워크에서 클러스터링 기법은 클러스터를 형성하여 데이터를 병합한 후 한 번에 전송해서 에너지를 효율적으로 사용하는 기법이다. 본 논문에서는 클러스터 그룹 모델을 이용한 계층적 불균형 클러스터링 기법을 제안한다. 이 기법은 전체 네트워크를 두 개의 계층으로 나누어 클러스터 그룹으로 형성된 2계층의 데이터를 병합해서 1계층으로 보내고, 다시 1계층에서 데이터를 병합하여 기지국으로 보낸다. 이와 같이 제안된 기법은 다중 홉 통신 구조와 클러스터 그룹 모델을 같이 이용함으로써 전체 에너지 소모량을 줄인다. 이러한 방식은 다중 홉 통신이지만 불균형 클러스터를 구축하여 핫 스팟 문제를 어느 정도 해결하고 있다. 실험을 통하여 제안된 계층적 불균형 클러스터링 기법이 이전의 클러스터링 기법보다 네트워크 에너지 효율이 향상되었음을 보였다.
https://doi.org/10.5762/KAIS.2011.12.6.2815 인용 PDF KSCI

검색결과 269건 처리시간 0.032초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)