Text Document Categorization using FP-Tree

FP-Tree를 이용한 문서 분류 방법

  • 박용기 (경북대학교 컴퓨터과학과) ;
  • 김황수 (경북대학교 컴퓨터과학과)
  • Published : 2007.11.15


As the amount of electronic documents increases explosively, automatic text categorization methods are needed to identify those of interest. Most methods use machine learning techniques based on a word set. This paper introduces a new method, called FPTC (FP-Tree based Text Classifier). FP-Tree is a data structure used in data-mining. In this paper, a method of storing text sentence patterns in the FP-Tree structure and classifying text using the patterns is presented. In the experiments conducted, we use our algorithm with a #Mutual Information and Entropy# approach to improve performance. We also present an analysis of the algorithm via an ordinary differential categorization method.


