통합 검색 | Korea Science

Inverted Index based Modified Version of K-Means Algorithm for Text Clustering

Jo, Tae-Ho
- Journal of Information Processing Systems
- /
- 제4권2호
- /
- pp.67-76
- /
- 2008
This research proposes a new strategy where documents are encoded into string vectors and modified version of k means algorithm to be adaptable to string vectors for text clustering. Traditionally, when k means algorithm is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the k means algorithm adaptable to string vectors for text clustering.
https://doi.org/10.3745/JIPS.2008.4.2.067 인용 PDF KSCI

흰개미 군집 알고리즘을 이용한 유사 블로그 추천 시스템에 관한 연구 (A Study of Similar Blog Recommendation System Using Termite Colony Algorithm)

정기성;조이석;이말례
- 한국인터넷방송통신학회논문지
- /
- 제13권1호
- /
- pp.83-88
- /
- 2013
본 연구의 목적은 유사 블로그 추천 시스템을 통해서 특정 주제의 유사도에 따라 주제를 찾아 주는 것이다. 유사 추천 시스템을 실현하기 위해서는 대규모 데이터 집합에서 유사항목을 가진 그룹을 찾을 수 있도록 군집해야 한다. 군집화(clustering) 기법은 군집하고자 하는 목적에 따라 적합한 기법과 군집수가 결정되어야 한다. 군집기법으로는 가장 많이 사용되는 K-means 알고리즘을 사용 하였고 추천 알고리즘은 흰개미 군집 알고리즘을 사용하였다. 흰개미 습성 모델을 이용한 군집화 기법은 K-means 알고리즘이 갖고 있는 적절한 군집 갯수 문제점을 해결하고, 군집화 시간을 단축하며, 군집을 위한 군집 평균 이동횟수를 개선한다.
https://doi.org/10.7236/JIIBC.2013.13.1.83 인용 PDF KSCI

수정된 K-means 알고리즘 (Modified K-means algorithm)

김형철;조제황
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1999년도 학술발표대회 논문집 제18권 2호
- /
- pp.115-118
- /
- 1999
One of the typical methods to design a codebook is K-means algorithm. This algorithm has the drawbacks that converges to a locally optimal codebook and its performance is mainly decided by an initial codebook. D. Lee's method is almost same as the K-means algorithm except for a modification of a distance value. Those methods have a fixed distance value during all iterations. After many iterations. because the distance between new codevectors and old codevectors is much shorter than the distance in the early stage of iterations, the new codevectors are not affected by distance value. But new codevectors decided in the early stage of learning iterations are much affected by distance value. Therefore it is not appropriate to fix the distance value during all iterations. In this paper, we propose a new algorithm using each different distance value between codevectors for a limited iterations in the early stage of learning iteration. In the experiment, the result show that the proposed method can design better codebooks than the conventional K-means algorithms.
PDF

다목적 유전자 알고리즘을 이용한문서 클러스터링 (The Document Clustering using Multi-Objective Genetic Algorithms)

이정송;박순철
- 한국산업정보학회논문지
- /
- 제17권2호
- /
- pp.57-64
- /
- 2012
본 논문에서는 텍스트 마이닝 분야에서 중요한 부분을 차지하고 있는 문서 클러스터링을 위하여 다목적 유전자 알고리즘을 제안한다. 문서 클러스터링에 있어 중요한 요소 중 하나는 유사한 문서를 그룹화 하는 클러스터링 알고리즘이다. 지금까지 문서 클러스터링에는 k-means 클러스터링, 유전자 알고리즘 등을 사용한 연구가 많이 진행되고 있다. 하지만 k-means 클러스터링은 초기 클러스터 중심에 따라 성능 차이가 크며 유전자 알고리즘은 목적함수에 따라 지역 최적해에 쉽게 빠지는 단점을 갖고 있다. 본 논문에서는 이러한 단점을 보완하기 위하여 다목적 유전자 알고리즘을 문서 클러스터링에 적용해 보고, 기존의 알고리즘과 정확성을 비교 및 분석한다. 성능 시험을 통해 k-means 클러스터링(약 20%)과 기존의 유전자 알고리즘(약 17%)을 비교할 때 본 논문에서 제안한 다목적 유전자 알고리즘의 성능이 월등하게 향상됨을 보인다.
https://doi.org/10.9723/jksiis.2012.17.2.057 인용 PDF KSCI

새로운 갱신조건을 적용한 부호책 생성 알고리즘 (A Codebook Generation Algorithm Using a New Updating Condition)

김형철;조제황
- 융합신호처리학회논문지
- /
- 제5권3호
- /
- pp.205-209
- /
- 2004
벡터양자화에서 사용되는 부호책 생성 알고리즘들 중에서 가장 널리 사용되는 방법은 K-means 알고리즘이다. 본 논문에서는 부호책의 성능 개선을 위해 새로운 갱신조건을 적용한 부호책 생성 알고리즘을 제안한다. 기존의 K-means 알고리즘은 모든 학습반복 과정 동안 부호벡터 갱신 시 거리의 가중치를 고정하지만, 제안된 방법은 학습반복 과정에서 새로운 부호벡터의 갱신 조건에 따라서 다른 가중치를 적용하여 부호책을 구한다. 따라서, 갱신 조건에 의해 부호벡터에 다른 가중치를 적용할 수 있고, 학습반복 과정마다 가변되는 가중치를 적용하는 효과를 얻을 수 있다. 실험 결과 K-means 알고리즘보다 부호책의 성능이 향상됨을 확인하였다.
PDF

예측 데이터를 이용한 빠른 K-Means 알고리즘 (Fast K-Means Clustering Algorithm using Prediction Data)

지태창;이현진;이일병
- 한국콘텐츠학회논문지
- /
- 제9권1호
- /
- pp.106-114
- /
- 2009
본 논문에서 K-Means 군집화 알고리즘을 빠르게 적용하는 방법을 제안했다. 제안하는 알고리즘의 특징은 속도 향상을 위해 변화될 가능성이 있는 데이터를 예측하는 것이다. 군집화 알고리즘의 각 단계에서 군집이 변경될 가능성이 있는 데이터만 선택하여 군집 중심과의 거리를 계산함으로써 전체 군집 계산 시간을 줄일 수 있었다. 군집이 변화될 예측 데이터를 계산할 때는 K-Means 알고리즘을 적용하면서 생성되는 거리 정보를 사용함으로써 추가되는 계산 시간이 적고, 특히, 거리 정보를 이용하기 때문에 차원의 개수에는 영향을 덜 받는 알고리즘을 제안할 수 있었다. 제안하는 알고리즘의 성능 비교를 위해서 원래의 K-Means인 Lloyd's와 이를 개선한 KMHybrid와 비교했다. 제안하는 알고리즘은 대용량 데이터( 입력 데이터의 크기가 크고, 데이터의 차원이 크며, 군집의 개수가 많은 경우)의 경우에 Lloyd's와 KMHybrid보다 높은 속도 향상을 보였다.
https://doi.org/10.5392/JKCA.2009.9.1.106 인용 PDF

AMI로부터 측정된 전력사용데이터에 대한 군집 분석 (Clustering load patterns recorded from advanced metering infrastructure)

안효정;임예지
- 응용통계연구
- /
- 제34권6호
- /
- pp.969-977
- /
- 2021
본 연구에서는 Hierarchical K-means 군집화 알고리즘을 이용해 서울의 A아파트 가구들의 전력 사용량 패턴을 군집화 하였다. 차원을 축소해주면서 패턴을 파악할 수 있는 Hierarchical K-means 군집화 알고리즘은 기존 K-means 군집화 알고리즘의 단점을 보완하여 최근 대용량 전력 사용량 데이터에 적용되고 있는 방법론이다. 본 연구에서는 여름 저녁 피크 시간대의 시간당 전력소비량 자료에 대해 군집화 알고리즘을 적용하였으며, 다양한 군집 개수와 level에 따라 얻어진 결과를 비교하였다. 결과를 통해 사용량에 따라 패턴이 군집화 됨을 확인하였으며, 군집화 유효성 지수들을 통해 이를 비교하였다.
https://doi.org/10.5351/KJAS.2021.34.6.969 인용 PDF KSCI

셀 생산방식에서 자기조직화 신경망과 K-Means 알고리즘을 이용한 기계-부품 그룹형성 (Machine-Part Grouping in Cellular Manufacturing Systems Using a Self-Organizing Neural Networks and K-Means Algorithm)

이상섭;이종섭;강맹규
- 산업경영시스템학회지
- /
- 제23권61호
- /
- pp.137-146
- /
- 2000
One of the problems faced in implementing cellular manufacturing systems is machine-part group formation. This paper proposes machine-part grouping algorithms based on Self-Organizing Map(SOM) neural networks and K-Means algorithm in cellular manufacturing systems. Although the SOM spreads out input vectors to output vectors in the order of similarity, it does not always find the optimal solution. We rearrange the input vectors using SOM and determine the number of groups. In order to find the number of groups and grouping efficacy, we iterate K-Means algorithm changing k until we cannot obtain better solution. The results of using the proposed approach are compared to the best solutions reported in literature. The computational results show that the proposed approach provides a powerful means of solving the machine-part grouping problem. The proposed algorithm Is applied by simple calculation, so it can be for designer to change production constraints.
PDF

K-Means 알고리즘을 이용한 계층적 클러스터링에서 클러스터 계층 깊이와 초기값 선정 (Selection of Cluster Hierarchy Depth and Initial Centroids in Hierarchical Clustering using K-Means Algorithm)

이신원;안동언;정성종
- 정보관리학회지
- /
- 제21권4호
- /
- pp.173-185
- /
- 2004
정보통신의 기술이 발달하면서 정보의 양이 많아지고 사용자의 질의에 대한 검색 결과 리스트도 많이 추출되므로 빠르고 고품질의 문서 클러스터링 알고리즘이 중요한 역할을 하고 있다. 많은 논문들이 계층적 클러스터링 방법을 이용하여 좋은 성능을 보이지만 시간이 많이 소요된다. 반면 K-means 알고리즘은 시간 복잡도를 줄일 수 있는 방법이다. 본 논문에서는 계층적 클러스터링 시스템인 콘도르(Condor) 시스템에서 간단하고 고품질이며 효율적으로 정보 검색 할 수 있도록 구현하였다. 이 시스템은 K-Means Algorithm을 이용하였으며 클러스터 계층 깊이와 초기값을 조절하여 $88\%$의 정확율을 보였다.
https://doi.org/10.3743/KOSIM.2004.21.4.173 인용 PDF

Fuzzy k-Means Local Centers of the Social Networks

Woo, Won-Seok;Huh, Myung-Hoe
- Communications for Statistical Applications and Methods
- /
- 제19권2호
- /
- pp.213-217
- /
- 2012
Fuzzy k-means clustering is an attractive alternative to the ordinary k-means clustering in analyzing multivariate data. Fuzzy versions yield more natural output by allowing overlapped k groups. In this study, we modify a fuzzy k-means clustering algorithm to be used for undirected social networks, apply the algorithm to both real and simulated cases, and report the results.
https://doi.org/10.5351/CKSS.2012.19.2.213 인용 PDF KSCI

검색결과 1,321건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)