Representation of ambiguous word in Latent Semantic Analysis

LSA모형에서 다의어 의미의 표상

  • 이태헌 (서울대학교 심리학과) ;
  • 김청택 (서울대학교 심리학과, 서울대학교 인지과학협동과정)
  • Published : 2004.06.01

Abstract

Latent Semantic Analysis (LSA Landauer & Dumais, 1997) is a technique to represent the meanings of words using co-occurrence information of words appearing in he same context, which is usually a sentence or a document. In LSA, a word is represented as a point in multidimensional space where each axis represents a context, and a word's meaning is determined by its frequency in each context. The space is reduced by singular value decomposition (SVD). The present study elaborates upon LSA for use of representation of ambiguous words. The proposed LSA applies rotation of axes in the document space which makes possible to interpret the meaning of un. A simulation study was conducted to illustrate the performance of LSA in representation of ambiguous words. In the simulation, first, the texts which contain an ambiguous word were extracted and LSA with rotation was performed. By comparing loading matrix, we categorized the texts according to meanings. The first meaning of an ambiguous wold was represented by LSA with the matrix excluding the vectors for the other meaning. The other meanings were also represented in the same way. The simulation showed that this way of representation of an ambiguous word can identify the meanings of the word. This result suggest that LSA with axis rotation can be applied to representation of ambiguous words. We discussed that the use of rotation makes it possible to represent multiple meanings of ambiguous words, and this technique can be applied in the area of web searching.

잠재의미분석은 단어 의미를 동일한 맥락 (문장/문서) 하에서 동시에 제시되는 단어들의 공기성(co-occurence)으로 정의한다. 이 분석에서 한 단어는 맥락들을 대표하는 측들로 구성된 다차원 상의 한 점으로 표상 되며, 단어 의미는 각 단어가 맥락 속에서 등장한 빈도로 정의된다. 이 다차원 의미공간은 SVD를 통하여 차원이 축소되어 추상된 의미를 표상 한다. 이 연구는 다의어의 표상이 가능하도록 LSA를 발전시켰다. 제안된 LSA는 축에 대한 해석이 가능하도록 축의 회전을 도입하였으며 다의어 표상을 가능하게 하였다. 시뮬레이션에서는, 먼저 LSA에 의해 산출된 단어-맥락 빈도표에서 다의어를 포함하고 있는 문서들만을 재 수집한 다음 문서들을 다의어 의미별로 분류하였다. 두 번째 단계에서는 다의어의 특정의미에 대한 표상을 분류된 단어-맥락 빈도표에서 비해당 의미에 대한 맥락들을 제거한 후 LSA를 적용하여 구성하였다. 시뮬레이션 결과는 다의어의 의미들을 LSA가 표상 할 수 있음을 보여주었다. 이는 축회전을 포함한 LSA가 다의어 다중의미를 표상 할 수 있고 실용적인 측면에서 웹검색 엔진에도 적용될 수 있음을 시사한다.

Keywords

References

  1. 한국심리학회지: 실험 및 인지 v.14 뇌와 인지모형: 잠재의미분석을 통한 문서분류 김청택;이태헌
  2. Multivariate Behavional Research v.36 An overview of analytic rotation in exploratory factor analysis Browne, M. W.
  3. Psychometrika v.1 The approximation of one matrix by another of lower rank Eckart, C.;Young, C.
  4. Psychometrika v.66 A simple general procedure for orthogonal rotation Jennrich, R. I.
  5. Psychometrika v.67 A simple general method for oblique rotation Jennrich, R. I.
  6. New ways of analyzing variation in English The boundaries of words and their meanings Labov, W.;C. J. Bailey(ed.);R. W. Shuy(ed.)
  7. Psychological Review v.104 A soution to Plato's problem: The latent semantic analysis theory of acquistion, induction, and representation of knowledge Landauer, T. K.;Dumais, S. T.
  8. Ward and object Quine, W. V. O.
  9. Cognitive Psychology v.14 Automatic acces of the meanings of ambiguous words in context: Some limitations of knowledge-based processing Seidenberg, M.S.;Tanenhuas, M. K.;Leiman, J. M.;Bienkowski,M.
  10. Journal of Verbal Learning and Verbal Behavior v.20 Meaning dominance and semantic context in the processing of lexical ambiguity Simpson, G. B.
  11. Journal of Memary and Language v.30 Selective access of homograph meanings in sentence context Simpson, G. B.;Kreuger, M. A.
  12. Journal of Verbal Learning and Verbal Behavior v.18 Lexical access during sentence comprehension: Reconsideration of context effects Swinney, D. A.
  13. Multiple Factor Analysis Thurstone, L. L.
  14. Understanding word and sentence Understanding words in context Tobossi, P.;G. B. Simpson(ed.)