- Volume 6 Issue 4
This paper aims to offer a solution based on intelligent document classification to create a user-centric information retrieval system allowing user-centric linguistic expression. So, structures expressing user intention and fine document classifying process using EBL, similarity, knowledge base, user intention, are proposed. To overcome the problem requiring huge and exact semantic information, a hybrid process is designed integrating keyword, thesaurus, probability and user intention information. User intention tree hierarchy is build and a method of extracting group intention between key words and user intentions is proposed. These structures and processes are implemented in HDCI(Hybrid Document Classification with Intention) system. HDCI consists of analyzing user intention and classifying web documents stages. Classifying stage is composed of knowledge base process, similarity process and hybrid coordinating process. With the help of user intention related structures and hybrid coordinating process, HDCI can efficiently categorize web documents in according to user's complex linguistic expression with small priori information.
document classification;intention hierarchy;thesaurus;similarity;semantic weight