DOI QR코드

DOI QR Code

Active Learning based on Hierarchical Clustering

계층적 군집화를 이용한 능동적 학습

  • 우호영 (충남대학교 컴퓨터공학과) ;
  • 박정희 (충남대학교 컴퓨터공학과)
  • Received : 2013.05.14
  • Accepted : 2013.07.02
  • Published : 2013.10.31

Abstract

Active learning aims to improve the performance of a classification model by repeating the process to select the most helpful unlabeled data and include it to the training set through labelling by expert. In this paper, we propose a method for active learning based on hierarchical agglomerative clustering using Ward's linkage. The proposed method is able to construct a training set actively so as to include at least one sample from each cluster and also to reflect the total data distribution by expanding the existing training set. While most of existing active learning methods assume that an initial training set is given, the proposed method is applicable in both cases when an initial training data is given or not given. Experimental results show the superiority of the proposed method.

능동적 학습(active learning)은 소수의 라벨 데이터로 구성된 훈련 집합이 주어진 경우에 분류기 학습에 가장 도움이 될 만한 언라벨드 데이터를 선택하여 전문가에 의한 라벨링을 통해 훈련 집합에 포함시키는 과정을 반복함으로써 분류기의 성능을 향상시키는 것을 목적으로 한다. 본 논문에서는 워드 연결(ward's linkage)을 이용한 계층적 군집화(hierarchical clustering)를 바탕으로 한 능동적 학습 방법을 제안한다. 제안된 방법은 각 군집에서 적어도 하나의 샘플을 포함하도록 초기 훈련 집합을 능동적으로 구성하거나 또는 기존의 훈련 집합을 확장함으로써 전체 데이터 분포를 반영할 수 있게 한다. 기존의 능동적 학습 방법들 중 대부분은 초기 훈련 집합이 주어져 있을 경우를 가정하는 반면에 제안하는 방법은 초기 클래스 정보를 가진 훈련 데이터가 주어지지 않은 경우와 주어진 경우에 모두 적용 가능하다. 실험을 통하여 제안하는 방법이 비교 방법들에 비해 분류기 성능을 크게 향상시킬 수 있는 효과적인 데이터 선택을 수행함을 보인다.

Keywords

References

  1. B. Settles, "Active learning literature survey: Computer sciences technical report 1648", University of Wisconsin-Madison, 2009
  2. S. Tong and D. Koller, "Support Vector Machine Active Learning with Applications to Text Classification", J. Machine Learning Research, Vol.2, pp.45-66, 2002.
  3. L. Zhang, C. Chen, J. Bu, D. Cai, X. He, T. S. Huang, "Active Learning Based on Locally Linear Reconstruction", IEEE Trans. Pattern Anal. Machine Int., Vol.33, No.10, pp. 2026-2038, 2011. https://doi.org/10.1109/TPAMI.2011.20
  4. Hoyoung Woo, Cheong Hee Park, "Efficient Active Learning Method Based on Random Sampling and Backward Deletion", LNCS Vol.7751, 2013.
  5. P. Tan, M. "Steinbach, and V. Kumar, Introduction to Data Mining", Addison Wesley, Boston 2006.
  6. Woo H, C. H. Park, "Active Learning using Hierarchical Clustering and stratified sampling", KISSE proceeding, Vol.39, No.2(B), pp.216-218, 2012.
  7. A. J. Joshi, F. Porikli, and N. Papanikolopoulos, "Multi-class active learning for image classification", in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognition, pp.2372-2379, 2009.
  8. Y. Freund, H.S. Seung, E. Shamir, and N. Tishby, "Selective sampling using the query by committee algorithm", Machine learning, Vol.28(2-3), 1997.
  9. P. Melville and R. Mooney. "Diverse ensembles for active learning", In Proceedings of the International Conference on Machine Learning (ICML), pp.584-591. Morgan Kaufmann, 2004.
  10. N. Roy and A. McCallum. "Toward optimal active learning through sampling estimation of error reduction", In Proceedings of the International Conference on Machine Learning (ICML), pp.441-448. Morgan Kaufmann, 2001.
  11. Ward, J. H., Jr., "Hierarchical Grouping to Optimize an Objective Function", Journal of the American Statistical Association, 48, 236-244, 1963.
  12. N. Semmar, B. Bruguerolle, N. Simon, "Cluster Analysis: An Alternative Method for Covariate Selection in Population Pharmacokinetic Modeling", Journal of Pharmacokinetics and Pharmacodynamics, Vol.32, 2005.
  13. Davies, David L. Bouldin, Donald W. A "Cluster Separation Measure", IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1 (2): 224-227. 1979. https://doi.org/10.1109/TPAMI.1979.4766909
  14. Ihara, Shunsuke., "Information theory for continuous systems", World Scientific. p. 2. ISBN 978-981-02-0985-8. 1993.
  15. Christopher D. Manning, Prabhakar Raghavan & Hinrich Schutze. "Introduction to Information Retrieval". Cambridge University Press. ISBN 978-0-521-86571-5, 2008.
  16. UCI Machine Learning Repository [Internet], http://archive.ics.uci.edu/ml
  17. A Library for Support Vector Machines [Internet], http://csie.ntu.edu.tw/-cjlin/libsvm/
  18. Machine Learning Group at University of Waikato [Internet], http://www.cs.waikato.ac.nz/ml/weka/