• Title/Summary/Keyword: Term-pair Indexing

Search Result 2, Processing Time 0.022 seconds

Measurement of Document Similarity using Term/Term-pair Features and Neural Network (단어/단어쌍 특징과 신경망을 이용한 두 문서간 유사도 측정)

  • Kim Hye Sook;Park Sang Cheol;Kim Soo Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.12
    • /
    • pp.1660-1671
    • /
    • 2004
  • This paper proposes a method for measuring document similarity between two documents. One of the most significant ideas of the method is to estimate the degree of similarity between two documents based on the frequencies of terms and term-pair, existing in both the two documents. In contrast to conventional methods which takes only one feature into account, the proposed method considers several features at the same time and meatures the similarity using a neural network. To prove the superiority of our method, two experiments have been conducted. One is to verify whether the two input documents are from the same document or not. The other is a problem of information retrieval with a document as the query against a large number of documents. In both the two experiments, the proposed method shows higher accuracy than two conventional methods, Cosine similarity measurement and a term-pair method.

Study on the Vocabulary Synthesis for Index Term Selection (색인어 선정을 위한 어휘결집력에 관한 연구)

  • Kim, Chul;Jeong, Jun-Min
    • Journal of the Korean Society for information Management
    • /
    • v.13 no.1
    • /
    • pp.205-226
    • /
    • 1996
  • Under the hypothesis that any pair of terms in the sentence is meaningful to present the context of the paper, the Brillouin measure of term relatedness in automatic indexing is proposed. For the experiment, the pair of terms simul-taneously appeared in two or more sentences of the paper are extracted from the title and abstract of the paper. Com-pared with the list of index terms or subject headings suggested by the author, the terms in term relatedness graph are highly matched with the terms in the list. Especially, it is revealed that the rank of terms by synthetic strength is use-ful in the selection of index terms.

  • PDF