DOI QR코드

DOI QR Code

지지벡터기계를 이용한 단어 의미 분류

Word Sense Classification Using Support Vector Machines

  • 박준혁 (한국교통대학교 컴퓨터정보공학과) ;
  • 이성욱 (한국교통대학교 컴퓨터정보공학과)
  • 투고 : 2016.10.04
  • 심사 : 2016.10.12
  • 발행 : 2016.11.30

초록

단어 의미 분별 문제는 문장에서 어떤 단어가 사전에 가지고 있는 여러 가지 의미 중 정확한 의미를 파악하는 문제이다. 우리는 이 문제를 다중 클래스 분류 문제로 간주하고 지지벡터기계를 이용하여 분류한다. 세종 의미 부착 말뭉치에서 추출한 의미 중의성 단어의 문맥 단어를 두 가지 벡터 공간에 표현한다. 첫 번째는 문맥 단어들로 이뤄진 벡터 공간이고 이진 가중치를 사용한다. 두 번째는 문맥 단어의 윈도우 크기에 따라 문맥 단어를 단어 임베딩 모델로 사상한 벡터 공간이다. 실험결과, 문맥 단어 벡터를 사용하였을 때 약 87.0%, 단어 임베딩을 사용하였을 때 약 86.0%의 정확도를 얻었다.

The word sense disambiguation problem is to find the correct sense of an ambiguous word having multiple senses in a dictionary in a sentence. We regard this problem as a multi-class classification problem and classify the ambiguous word by using Support Vector Machines. Context words of the ambiguous word, which are extracted from Sejong sense tagged corpus, are represented to two kinds of vector space. One vector space is composed of context words vectors having binary weights. The other vector space has vectors where the context words are mapped by word embedding model. After experiments, we acquired accuracy of 87.0% with context word vectors and 86.0% with word embedding model.

키워드

참고문헌

  1. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781, 2013.
  2. Michael Lesk, "Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone," in Proceedings of the 5th Annual International Conference on Systems Documentation, 1986.
  3. Yong-Gu Lee and Young-Mee Chung, "An Experimental Study on an Effective Word Sense Disambiguation Model Based on Automatic Sense Tagging Using Dictionary Information," Journal of the Korean Society for Information Management, Vol.24, No.1, pp.321-342, 2007.
  4. Jung-Gil Cho and Kwang-Cheul Shin, "A Graph-based Word Sense Disambiguation Using Measures of Graph Connectivity," Journal of Korean Institute of Information Technology, Vol.12, No.6, pp.143-151, 2014.
  5. Dongsuk O, Sangwoo Kang, and Jungyun Seo, "An Iterative Approach to Graph-based Word Sense Disambiguation Using Word2ec," Korean Journal of Cognitive Science, Vol.2, No.1, pp.43-60, 2016.
  6. SangKeun Park, Jeeyeon Choi, and Key-Sun Choi, "Word Sense Disambiguation using Dynamic Sized Window and Frequency Weighting," Korea Information Science Society, pp.441-443, 2014.
  7. Yong Min Park and Jae Sung Lee, "Word Sense Disambiguation using Korean Word Space Model," Journal of the Korea Contents Association, Vol.12, No.6, pp.41-47, 2012. https://doi.org/10.5392/JKCA.2012.12.06.041
  8. Myung Yun Kang, Bogyum Kim, and Jae Sung Lee, "Word Sense Disambiguation using Word2Vec," in Proceedings of the 27th Annual Conference on Human & Cognitive Language Technology, pp.81-84, 2015.
  9. Sangwook Kang, Minho Kim, Hyuk-chul Kwon, and Jyhyun Oh, "Word Sense Disambiguation of Predicate using Semi-supervised Learning and Sejong Electronic Dictionary," KIISE Transactions on Computing Practices, Vol.22, No.2, pp.107-112, 2016. https://doi.org/10.5626/KTCP.2016.22.2.107
  10. Scikit Learn, 2016 [Internet], http://scikit-learn.org/stable/modules/svm.html.
  11. Yeohoon Yoon, "Word sense disambiguation through the acyclic semantic transition network," Master thesis, Sogang University, 2003.