An Ensemble Clustering Algorithm based on a Prior Knowledge

Ko, Song;Kim, Dae-Won;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 36 Issue 2
/
Pages.109-121
/
2009
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

An Ensemble Clustering Algorithm based on a Prior Knowledge

사전정보를 활용한 앙상블 클러스터링 알고리즘

Ko, Song ;
Kim, Dae-Won

고송 (중앙대학교 컴퓨터공학과) ;
김대원 (중앙대학교 컴퓨터공학과)

Published : 2009.02.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Although a prior knowledge is a factor to improve the clustering performance, it is dependant on how to use of them. Especial1y, when the prior knowledge is employed in constructing initial centroids of cluster groups, there should be concerned of similarities of a prior knowledge. Despite labels of some objects of a prior knowledge are identical, the objects whose similarities are low should be separated. By separating them, centroids of initial group were not fallen in a problem which is collision of objects with low similarities. There can use the separated prior knowledge by various methods such as various initializations. To apply association rule, proposed method makes enough cluster group number, then the centroids of initial groups could constructed by separated prior knowledge. Then ensemble of the various results outperforms what can not be separated.

사전정보는 클러스터링 성능을 유도할 수 있는 요인이지만, 활용 방법에 따라 차이는 발생한다. 특히, 사전정보를 초기 중심으로 활용할 때, 사전정보 간 유사도에 대해 고려하는 것이 필요하다. 레이블이 같더라도 낮은 유사도를 갖는 사전정보로 인해 초기 중심 설정 시 문제가 발생할 수 있기 때문에, 이들을 구분하여 활용하는 방법이 필요하다. 따라서 본 논문은 낮은 유사도를 갖는 사전정보를 구분하여 문제를 해결하는 방법을 제시한다. 또한 유사도에 의해 구분된 사전정보는 다양하게 활용함으로써 생성되는 다양한 클러스터링 결과를 연관규칙에 기반하여 앙상블 함으로써 통합된 하나의 분석 결과를 도출하여 클러스터링 분석 성능을 더욱 개선시킬 수 있다.

Keywords

References

A.K. Jain, M.N. Murty, P.J. Flynn, 'Data Clustering : A Review,' ACM Computing Surveys, Vol.31, No.3, September https://doi.org/10.1145/331499.331504
Brian S.Everitt et al, 'Cluster Analysis,' ARNOLD
Aidong zhang, 'advanced analysis of gene expression microarray data,' World Scientific, 2006
Danh V. Nguyen et al, 'Tumor classification by partial least squares using microarray gene expressiondata,' Bioinformatics, Vol.18, No.1, p. 39-50, Jun 2002 https://doi.org/10.1093/bioinformatics/18.1.39
Sugato Basu, 'Semi-supervised Clustering by Seeding,' Proceedings of the 19th International Conference on Machine Learning, (ICML-2002), pp. 19-26, Sydney, Australia, July 2002
Akinori Fujino et al, 'Semisupervised Learning for a Hybrid Generative/Discriminative Classifier Based on the Maximum Entropy Principle,' IEEE Trans, Pattern Analysis and machine intelligence, Vol.30, No.3, MARCH 2008 https://doi.org/10.1109/TPAMI.2007.70710
Dan Klein, Sepandar D. Kamvar, Christopher D. Manning, 'From Instance-level Constraints to Spacelevel Constraints : Making the Most of Prior Knowledge in Data Clustering'
Kiri Wagsta, 'Constrained K-means Clustering with Background Knowledge,' Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577-584, 2001
M.A.T. Figueiredo et al, 'Unsupervised Learning of Finite Mixture Models,' IEEE Trans, Pattern Analysis and machine intelligence, March Vol.24, No.3, pp. 381-396, 2002 https://doi.org/10.1109/34.990138
Ana L.N. Fred, Anil K. Jain, 'Combining Multiple Clusterings Using Evidence Accumulation,' IEEE Trans, Pattern Analysis and machine intelligence, Vol.27, No.6, JUNE 2005 https://doi.org/10.1109/TPAMI.2005.113
Yi Hong, 'Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm,' Pattern Recognition, Vol.41, Issue. 9, SEPTEMBER 2008 https://doi.org/10.1016/j.patcog.2008.03.007
Lawrence Hubert, 'Comparing Partitions,' journal of Classification, 2:193-218, 1985 https://doi.org/10.1007/BF01908075
David Hand et al, 'principal of Data mining,' A Bradford Book The MIT Press Cambridge, Massachusetts London, England, 2001
http://www.geneontology.org

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

An Ensemble Clustering Algorithm based on a Prior Knowledge

사전정보를 활용한 앙상블 클러스터링 알고리즘

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)