An Iterative Approach to Graph-based Word Sense Disambiguation Using Word2Vec

Word2Vec을 이용한 반복적 접근 방식의 그래프 기반 단어 중의성 해소

  • O, Dongsuk (Computer Science and Engineering Sogang University) ;
  • Kang, Sangwoo (Computer Science and Engineering Sogang University) ;
  • Seo, Jungyun (Computer Science and Engineering Sogang University)
  • 오동석 (서강대학교 컴퓨터공학과) ;
  • 강상우 (서강대학교 컴퓨터공학과) ;
  • 서정연 (서강대학교 컴퓨터공학과)
  • Received : 2016.03.16
  • Accepted : 2016.03.17
  • Published : 2016.03.31

Abstract

Recently, Unsupervised Word Sense Disambiguation research has focused on Graph based disambiguation. Graph-based disambiguation has built a semantic graph based on words collocated in context or sentence. However, building such a graph over all ambiguous word lead to unnecessary addition of edges and nodes (and hence increasing the error). In contrast, our work uses Word2Vec to consider the most similar words to an ambiguous word in the context or sentences, to rebuild a graph of the matched words. As a result, we show a higher F1-Measure value than the previous methods by using Word2Vec.

지식기반을 이용한 비지도 방법의 단어 중의성 해소 연구는 그래프 기반 단어 중의성 해소 방법에 중점을 두고 있다. 그래프 기반 방법은 중의성 단어와 문맥이나 문장에서 같이 등장한 단어들과 의미그래프를 구축하여 연결 관계를 보고 중의성을 해소한다. 하지만, 모든 중의성 단어를 가지고 의미 그래프를 구축하게 되면 불필요한 간선과 노드 정보가 추가되어 오류를 증가시킨다는 단점이 있다. 본 연구에서는 이러한 문제를 해결하고자 반복적 접근 방식의 그래프 기반 단어 중의성 해소 방식을 사용한다. 이 방식은 모든 중의성 단어들을 특정 기준에 의해서 단어를 매칭 하고 매칭 된 단어들을 반복적으로 그래프를 재구축하여 단어중의성을 해소한다. 본 연구에서는 Word2Vec을 이용하여 문맥이나 문장 내에 중의성 단어와 의미적으로 가장 유사한 단어끼리 매칭하고, 매칭 된 단어들을 순서대로 그래프를 재구축하여 중의성 단어의 의미를 결정하였다. 결과적으로 Word2Vec의 단어 벡터정보를 이용하여 이전에 연구 되었던 그래프 기반 방법과 반복적 접근 방식의 그래프 기반 방법보다 더 높은 성능을 보여준다.

Keywords

References

  1. Agirre, E. & Soroa, A. (2009). Personalizing pagerank for word sense disambiguation. Proceedings of EACL, 33-41.
  2. Banerjee, S. & Pedersen, T. (2002). An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Proceeding, 136-145.
  3. Brin, S. & Page, L. (1998) The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 107-117.
  4. Florentina, V., Langlais, P. & Lapalme, G. (2004). Evaluating variants of the Lesk approach for disambiguation words. Proceedings of the Conference on Language Resources and Evaluation, 633-636.
  5. Freeman, L. C. (1979). Centrality in Social Networks Conceptual Clarification. Social Networks, 1(3), 215-239. https://doi.org/10.1016/0378-8733(78)90021-7
  6. Gutirrez, Y., Orqun, F., Camara, F., Castaeda, Y., Gonzlez, A., Montoyo, A., Muoz, R., Estrada, R., Piug, D., Abreu, I. & Prez, R. (2013). UMCC DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation. Proceedings of the 7th International Workshop on Semantic Evaluation, 241-249.
  7. Hessami, E., Mahmoudi, F. & Jadidinejad, A. (2011). Unsupervised Graph-based Word Sense Disambiguation Using lexical relation of WordNet. International Journal of Computer Issues, 8(6), 225-230.
  8. Kleinberg, M. (1999). Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46(5), 604-632. https://doi.org/10.1145/324133.324140
  9. Lesk, M. (1986). Automated sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cream cone. Proceedings of the SIGDOC, 24-26.
  10. Manion, L. & Sainudiin, R. (2014). An Iterative Sudoku Style Approach to Subgraph-based Word Sense Disambiguation. Proceedings of the 3rd Joint Conference on Lexical and Computational Semantics, 40-50.
  11. Mihalcea, R. (2005). Unsupervised large-vocabulary unsupervised word sense disambiguation with graph-based algorithms for sequence data labeling. Proceedings of HLT/EMNLP, 411-418.
  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality, Proceedings of The 27th Annual Conference on Neural Information Processing Systems.
  13. Navigli, R., Jurgens, D. & Vannella, D. (2013). SemEval-2013 Task 12: Multilingual Word Sense Disambiguation. Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantcis (*SEM 2013).
  14. Navigli, R. & Lapata, M. (2007). Graph Connectivity Measures for Unsupervised Word Sense Disambiguation. Proceedings of the 20th International Joint Conference on Artificial Intelligence, 1683-1688.
  15. Navigli, R. & Lapata, M. (2010). An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 678-692. https://doi.org/10.1109/TPAMI.2009.36
  16. Navigli, R. & Ponzetto, P. (2012). BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence, 217-250.
  17. Navigli, R. & Ponzetto, P. (2012). Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation. Proceedings of EMNLP/CoNLL, 1399-1410.
  18. Navigli, R. & Velardi, P. (2005). Structural Semantic Interconnections: A Knowledge-based Approach to Word Sense Disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 1075-1086. https://doi.org/10.1109/TPAMI.2005.149
  19. Sinha, R. & Mihalcea, R. (2007). Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity. Proceedings of the IEEE International Conference on Semantic Computing, 363-369.