• Title/Summary/Keyword: Compound Field-associated Term(FT)

Search Result 2, Processing Time 0.015 seconds

Collection and Extraction Algorithm of Field-Associated Terms (분야연상어의 수집과 추출 알고리즘)

  • Lee, Sang-Kon;Lee, Wan-Kwon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.347-358
    • /
    • 2003
  • VSField-associated term is a single or compound word whose terms occur in any document, and which makes it possible to recognize a field of text by using common knowledge of human. For example, human recognizes the field of document such as or , a field name of text, when she encounters a word 'Pitcher' or 'election', respectively We Proposes an efficient construction method of field-associated terms (FTs) for specializing field to decide a field of text. We could fix document classification scheme from well-classified document database or corpus. Considering focus field we discuss levels and stability ranks of field-associated terms. To construct a balanced FT collection, we construct a single FTs. From the collections we could automatically construct FT's levels, and stability ranks. We propose a new extraction algorithms of FT's for document classification by using FT's concentration rate, its occurrence frequencies.

An Extraction Algorithm of Compound Field-associated Terms for Korean Document Classifications (한글문서 분류용으로 이용할 복합어로 구성된 분야연상어의 추출법)

  • Lee, Samuel Sang-kon
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.7
    • /
    • pp.636-649
    • /
    • 2005
  • Field-associated Terms itself have field Information. So, they determine field of document just like when human being perceives field. In case of Korean, we organized and experimented them by collecting approximately IS,999 document banks that are classified into 180 fields. We obtained high precision of extraction that 88,782 single field-associated terms are contracted into 8,405 ones thus recording compression rate as approximately 9$\%$ and recall as above 0.77 (average 0.85), precision as above 0.90 (average 0.94). By applying established field-associated terms to initial determination for document classification and comparing it with filed determination by human being, we got correct answers above approximately 90$\%$. We can use results of research as fundamental research for initial stage and apply it document retrieval between multilingual environment thus utilizing it as fundamental research for multilingual information retrieval.