DOI QR코드

DOI QR Code

Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language

  • Seo, Hyeong-Won (Department of Computer Engineering, Korea Maritime and Ocean University) ;
  • Kim, Jae-Hoon (Department of Computer Engineering, Korea Maritime and Ocean University)
  • 투고 : 2014.11.04
  • 심사 : 2014.12.17
  • 발행 : 2015.02.28

초록

Constructing a bilingual multi-word lexicon is confronted with many difficulties such as an absence of a commonly accepted gold-standard dataset. Besides, in fact, there is no everybody's definition of what a multi-word unit is. In considering these problems, this paper evaluates and analyzes the context vector approach which is one of a novel alignment method of constructing bilingual lexicons from parallel corpora, by comparing with one of general methods. The approach builds context vectors for both source and target single-word units from two parallel corpora. To adapt the approach to multi-word units, we identify all multi-word candidates (namely noun phrases in this work) first, and then concatenate them into single-word units. As a result, therefore, we can use the context vector approach to satisfy our need for multi-word units. In our experimental results, the context vector approach has shown stronger performance over the other approach. The contribution of the paper is analyzing the various types of errors for the experimental results. For the future works, we will study the similarity measure that not only covers a multi-word unit itself but also covers its constituents.

키워드

참고문헌

  1. D. Bouamor, N. Semmar, and P. Zweigenbaum, "Automatic construction of a multiword expressions bilingual lexicon : a statistical machine translation evaluation perspective", Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon, pp. 95-108, 2012.
  2. B. Daille, D. k. Samuel, and M. Emmanuel, "French-English multi-word terms alignment based on lexical content analysis", Proceedings of the 4th International Conference on Language Resources and Evaluation, vol. 3, pp. 919-922, 2004.
  3. D. Wu and X. Xuanyin, "Learning an English-Chinese lexicon from a parallel corpus", Proceedings of the 1st Conference on Association for Machine Translation in the Americas, pp. 206-213, 1994.
  4. H. W. Seo, H. S. Kwon, and J. H. Kim, "Extended pivotbased approach for bilingual lexicon extraction", Journal of the Korean Society of Marine Engineering, vol. 38, no. 5, pp. 557-565, 2014. https://doi.org/10.5916/jkosme.2014.38.5.557
  5. J. H. Kim, H. W. Seo, and H. S. Kwon, "Bilingual lexicon induction thorough a pivot language", Journal of the Korean Society of Marine Engineering, vol. 37, no. 3, pp. 300-306, 2013. https://doi.org/10.5916/jkosme.2013.37.3.300
  6. H. W. Seo, H. S. Kwon, M. A. Cheon, and J. H. Kim, "Constructing bilingual multiword lexicons for a resourcepoor language pair", Advanced Science and Technology Letters, vol. 54 (HCI 2014), pp. 95-99, 2014.
  7. T. Tsunakawa, N. Okazaki, and J. Tsujii, "Building bilingual lexicons using lexical translation probabilities via pivot Languages", Proceedings of the 6th International Conference on Language Resources and Evaluation, pp. 1664-1667, 2008.
  8. H. W. Seo, H. C. Kim, H. Y. Cho, J. H. Kim, and S. I. Yang, "Automatically constructing English-Korean parallel corpus from web documents", Proceedings of the 26th on Korea Information Processing Society Fall Conference, vol. 13, no, 2, pp.161-164, 2006 (in Korean).
  9. P. Koehn, "Europarl : a parallel corpus for statistical machine translation", Proceedings of the Conference on the 10th Machine Translation Summit, pp. 79-86, 2005.
  10. B. M. Kang and H. G. Kim, "Sejong Korean corpora in the making", Proceedings of the 4th International Conference on Language Resources and Evaluation, vol. 5, pp. 1747-1750, 2004.

피인용 문헌

  1. Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary vol.322, pp.1757-899X, 2018, https://doi.org/10.1088/1757-899X/322/5/052054