Abstract
The majority of text categorization systems use the term-based classification method. However, because of too many terms, this method is not effective to classify the documents in areal-time environment. This paper presents a real-time concept-based text categorization system,which classifies texts using thesaurus. The system consists of a Korean morphological analyzer, athesaurus tool, and a probability-vector similarity measurer. The thesaurus tool acquires the meaningsof input terms and represents the text with not the term-vector but the concept-vector. Because theconcept-vector consists of semantic units with the small size, it makes the system enable to analyzethe text with real-time. As representing the meanings of the text, the vector supports theconcept-based classification. The probability-vector similarity measurer decides the subject of the textby calculating the vector similarity between the input text and each subject. In the experimentalresults, we show that the proposed system can effectively analyze texts with real-time and do aconcept-based classification. Moreover, the experiment informs that we must expand the thesaurustool for the better system.