DOI QR코드

DOI QR Code

Assignment Semantic Category of a Word using Word Embedding and Synonyms

워드 임베딩과 유의어를 활용한 단어 의미 범주 할당

  • 박다솔 (창원대학교 친환경해양플랜트FEED공학과) ;
  • 차정원 (창원대학교 컴퓨터공학과)
  • Received : 2017.05.18
  • Accepted : 2017.07.10
  • Published : 2017.09.15

Abstract

Semantic Role Decision defines the semantic relationship between the predicate and the arguments in natural language processing (NLP) tasks. The semantic role information and semantic category information should be used to make Semantic Role Decisions. The Sejong Electronic Dictionary contains frame information that is used to determine the semantic roles. In this paper, we propose a method to extend the Sejong electronic dictionary using word embedding and synonyms. The same experiment is performed using existing word-embedding and retrofitting vectors. The system performance of the semantic category assignment is 32.19%, and the system performance of the extended semantic category assignment is 51.14% for words that do not appear in the Sejong electronic dictionary of the word using the word embedding. The system performance of the semantic category assignment is 33.33%, and the system performance of the extended semantic category assignment is 53.88% for words that do not appear in the Sejong electronic dictionary of the vector using retrofitting. We also prove it is helpful to extend the semantic category word of the Sejong electronic dictionary by assigning the semantic categories to new words that do not have assigned semantic categories.

의미역 결정은 서술어와 논항들 사이의 의미 관계를 결정하는 문제이다. 의미역 결정을 위해 의미 논항 역할 정보와 의미 범주 정보를 사용해야 한다. 세종 전자사전은 의미역을 결정하는데 사용한 격틀 정보가 포함되어 있다. 본 논문에서는 워드 임베딩과 유의어를 활용하여 세종 전자사전을 확장하는 방법을 제시한다. 연관 단어가 유사한 벡터 표현을 갖도록 하기 위해 유의어 사전의 정보를 사용하여 재구성된 벡터를 생성한다. 기존의 워드 임베딩과 재구성된 벡터를 사용하여 동일한 실험을 진행한다. 워드 임베딩을 이용한 벡터로 단어의 세종 전자사전에 나타나지 않은 단어에 대해 의미 범주 할당의 시스템 성능은 32.19%이고, 확장한 의미 범주 할당의 시스템 성능은 51.14%이다. 재구성된 벡터를 이용한 단어의 세종 전자사전에 나타나지 않은 단어에 대해 의미 범주 할당의 시스템 성능은 33.33%이고, 확장한 의미 범주 할당의 시스템 성능은 53.88%이다. 의미 범주가 할당되지 않은 새로운 단어에 대해서 논문에서 제안한 방법으로 의미 범주를 할당하여 세종 전자사전의 의미 범주 단어 확장에 대해 도움이 됨을 증명하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. Daniel Jurafsky, James H. Martin, "Speech and Language Processing," Vol. 2, pp. 546, 2007.
  2. National Institute of Korean Language, Final Achievement of the 21st Century Sejong Plan, Ministry of Culture, Sports and Tourism, 2010.
  3. Hearst, M. A., "Automatic Acquisition of Hyponyms from LargeText Corpora," Association for Computational Linguistics, pp. 539-545, 1992.
  4. Cederberg, S. and Widdows, D., "Using LSA and Noun Coordination Information to Improve the Precision and Recall of Automatic Hyponymy Extraction," Proc. of the Conference on Natural Language Learning-2003, pp. 111-118, 2003.
  5. Verginica Barbu Mititelu, "Automatic Extraction of Patterns Displaying Hyponym-Hypernym Co-Occurrence from Corpora," Proc. of First Central European Student Conference in Linguistics, 2003.
  6. SANG, Erik Tjong Kim; HOFMANN, Katja; DE RIJKE, Maarten, "Extraction of Hypernymy Information from Text," Interactive Multi-modal Question-Answering, pp. 223-245, 2011.
  7. Pang. Chan-Seong and Lee Hae-Yun, A Study of the Automatic Extraction of Hypernyms arid Hyponyms from the Corpus, The Korean Society for Cognitive Science, Vol. 19, 2008. (in Korean)
  8. Choi Yu-Mi and Sakong Chul, Development of Algorithm for the Automatic Extraction of Broad Term, The 5th Proceedings of the Korean Society for information Management Conference, pp. 227-230, 1998. (in korean)
  9. Baroni, Marco, et al., "Entailment above the word level in distributional semantics," Proc. of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 23-32, 2012.
  10. Rei, Marek, and Ted Briscoe., "Looking for Hyponyms in Vector Space," the Conference on Natural Language Learning, pp. 68-77, 2014.
  11. Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., and Smith, N. A., "Retrofitting word vectors to semantic lexicons," arXiv preprint arXiv:1411.4166, 2014.
  12. Mikolov, Tomas, et al., "Efficient estimation of word representations in vector space," CoRR, abs/1301.3781, 2013.
  13. Available: https://air.changwon.ac.kr/-airdemo/kg_tagger/, 2016-09-10.
  14. Gerard Salton, A. Wong, and C. S. Yang, A vector space model for information retrieval, Communications of the ACM, pp. 613-620, 1975.
  15. Singhal, Amit, "Modern information retrieval: A brief overview," IEEE, pp. 35-43. 2001.
  16. Manwar, A. B., et al., "A Vector space model for information retrieval: A MATLAB approach," Indian Journal of Computer Science and Engineering (IJCSE), pp. 222-229, 2012.
  17. Danielsson, Per-Erik, "Euclidean distance mapping," Computer Graphics and image processing, pp. 227-248, 1980.
  18. Pearson, K., "Notes on the history of correlation," Biometrika, Vol. 13, pp. 25-45, 1920. https://doi.org/10.1093/biomet/13.1.25