A Study on Building Structures and Processes for Intelligent Web Document Classification

지능적인 웹문서 분류를 위한 구조 및 프로세스 설계 연구

  • 장영철 (경민대학 멀티미디어방송과)
  • Published : 2008.12.31

Abstract

This paper aims to offer a solution based on intelligent document classification to create a user-centric information retrieval system allowing user-centric linguistic expression. So, structures expressing user intention and fine document classifying process using EBL, similarity, knowledge base, user intention, are proposed. To overcome the problem requiring huge and exact semantic information, a hybrid process is designed integrating keyword, thesaurus, probability and user intention information. User intention tree hierarchy is build and a method of extracting group intention between key words and user intentions is proposed. These structures and processes are implemented in HDCI(Hybrid Document Classification with Intention) system. HDCI consists of analyzing user intention and classifying web documents stages. Classifying stage is composed of knowledge base process, similarity process and hybrid coordinating process. With the help of user intention related structures and hybrid coordinating process, HDCI can efficiently categorize web documents in according to user's complex linguistic expression with small priori information.

Keywords

document classification;intention hierarchy;thesaurus;similarity;semantic weight