A Clustering-based Semi-Supervised Learning through Initial Prediction of Unlabeled Data

미분류 데이터의 초기예측을 통한 군집기반의 부분지도 학습방법

  • 김응구 (한국생산성본부 컨설팅본부 CS경영센터) ;
  • 전치혁 (포항공과대학교 산업경영공학과)
  • Published : 2008.09.30

Abstract

Semi-supervised learning uses a small amount of labeled data to predict labels of unlabeled data as well as to improve clustering performance, whereas unsupervised learning analyzes only unlabeled data for clustering purpose. We propose a new clustering-based semi-supervised learning method by reflecting the initial predicted labels of unlabeled data on the objective function. The initial prediction should be done in terms of a discrete probability distribution through a classification method using labeled data. As a result, clusters are formed and labels of unlabeled data are predicted according to the Information of labeled data in the same cluster. We evaluate and compare the performance of the proposed method in terms of classification errors through numerical experiments with blinded labeled data.

Keywords

References

  1. Bar-Hillel, A., T. hertz, N. Shental, and D. Weinshall, Learning distance functions using equivalence relations. Proceedings of 20th International Conference on Machine Learning, Washington, USA, 2003, pp.11-18.
  2. Basu, S., A. Banerjee, and R. Mooney, Semisupervised clustering by seeding. Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 2002, pp. 19-26.
  3. Bilenko, M., S. Basu, and R. Mooney, Integrating constraints and metric learning in semisupervised clustering. Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004, pp.81-88.
  4. Bouchachia, A. and W. pedrycz, Data clustering with partial supervision. Data Mining and Knowledge Discovery, Vol.12, No.1(2006), pp. 47-78. https://doi.org/10.1007/s10618-005-0019-1
  5. Chapelle, O. and A. Zien, Semi-supervised classification by low density separation, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 2005, pp. 57-64.
  6. Cozman, F., I. Cohen, and M. Cirelo, Semi- Supervised learning of mixture models. Proceedings of the 20th International Conference on Machine Learning, 2003, pp.99-106.
  7. Demiriz, A., K. Bennett, and M. Embrechts, Semi-Supervised clustering using genetic algorithms. Intelligent Engineering Systems, Vol.9(1999), pp.809-814.
  8. Dempster, A.P., N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, Vol.39(1977), pp.1-38.
  9. Klein, D., S.D. Kamvar, and C. Manning, From instance-level constraints to space-level constraints : Making the most of prior knowledge in data clustering. Proceedings of the 19th International Conference on Machine Learning, 2002, pp.307-314.
  10. Lee, D. and J. Lee, Equilibrium-based support vector machine for semi-supervised classification, IEEE Trans. on Neural Networks, Vol.18, No.2(2007), pp.578-583. https://doi.org/10.1109/TNN.2006.889495
  11. Nigam, K., A. McCallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Machine Learning, Vol.39(2000), pp.103-134. https://doi.org/10.1023/A:1007692713085
  12. Tan, P.N., M. Steinbach, and V.Kumar, Introduction to Data Mining, Pearson Education, Boston, 2006.
  13. Wagstaff, K., C. Cardie, S. Rogers, and S. Schroedl, Constrained K-means clustering with background knowledge. Proceedings of the 18th International Conference on Machine Learning, Massachusetts, USA, 2001, pp.577-584.
  14. Xing, E.P., A.Y. Ng, M.I. Jordan, and S. Russell, Distance metric learning, with application to clustering with side information. Advances in Neural Information Processing Systems, Vol. 15(2003), pp.505-512.
  15. Zhu, X.Semi-supervised learning literature survey, Computer Sciences TR 1530, University of Wisconsin-Madison. http://www.cs.wisc. edu/-jerryzhu/pub/s sl_survey.pdf, 2007.
  16. UCI repository : http://www.ics.uci.edu/-mlearn/MLRepository .html.