한국언어정보학회:학술대회논문집 (Proceedings of the Korean Society for Language and Information Conference)
- 한국언어정보학회 2007년도 정기학술대회
- /
- Pages.515-521
- /
- 2007
Text Categorization for Authorship based on the Features of Lingual Conceptual Expression
- Zhang, Quan (The Institute of Acoustics, CAS) ;
- Zhang, Yun-liang (The Institute of Acoustics, CAS) ;
- Yuan, Yi (The Institute of Acoustics, CAS)
- 발행 : 2007.11.01
초록
The text categorization is an important field for the automatic text information processing. Moreover, the authorship identification of a text can be treated as a special text categorization. This paper adopts the conceptual primitives' expression based on the Hierarchical Network of Concepts (HNC) theory, which can describe the words meaning in hierarchical symbols, in order to avoid the sparse data shortcoming that is aroused by the natural language surface features in text categorization. The KNN algorithm is used as computing classification element. Then, the experiment has been done on the Chinese text authorship identification. The experiment result gives out that the processing mode that is put forward in this paper achieves high correct rate, so it is feasible for the text authorship identification.
키워드