A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method

Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상

  • 이용훈 (단국대학교 전자계산학과) ;
  • 이상범 (단국대학교 전자계산학과)
  • Received : 2010.10.21
  • Accepted : 2010.12.17
  • Published : 2010.12.31


Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.


Text Categorization;Document Classification;TF-IDF;TF-ICF;TF-ISF;Okapi BM25;SVM;Reuter-21578


Supported by : 단국대학교


