A Study on Incremental Learning Model for Naive Bayes Text Classifier

Naive Bayes 문서 분류기를 위한 점진적 학습 모델 연구

  • 김제욱 (서울대학교 컴퓨터공학부) ;
  • 김한준 (서울대학교 컴퓨터공학부) ;
  • 이상구 (서울대학교 컴퓨터공학부)
  • Published : 2001.07.01

Abstract

In the text classification domain, labeling the training documents is an expensive process because it requires human expertise and is a tedious, time-consuming task. Therefore, it is important to reduce the manual labeling of training documents while improving the text classifier. Selective sampling, a form of active learning, reduces the number of training documents that needs to be labeled by examining the unlabeled documents and selecting the most informative ones for manual labeling. We apply this methodology to Naive Bayes, a text classifier renowned as a successful method in text classification. One of the most important issues in selective sampling is to determine the criterion when selecting the training documents from the large pool of unlabeled documents. In this paper, we propose two measures that would determine this criterion : the Mean Absolute Deviation (MAD) and the entropy measure. The experimental results, using Renters 21578 corpus, show that this proposed learning method improves Naive Bayes text classifier more than the existing ones.

Keywords