DOI QR코드

DOI QR Code

Bilingual Multiword Expression Alignment by Constituent-Based Similarity Score

  • Received : 2015.06.19
  • Accepted : 2016.03.25
  • Published : 2016.09.30

Abstract

This paper presents the constituent-based approach for aligning bilingual multiword expressions, such as noun phrases, by considering the relationship not only between source expressions and their target translation equivalents but also between the expressions and constituents of the target equivalents. We only considered the compositional preferences of multiword expressions and not their idiomatic usages because our multiword identification method focuses on their collocational or compositional preferences. In our experimental results, the constituent-based approach showed much better performances than the general method for extracting bilingual multiword expressions. For our future work, we will examine the scoring method of the constituent-based approach in regards to having the best performance. Moreover, we will extend target entries in the evaluation dictionaries by considering their synonyms.

Keywords

References

  1. S. Venkatsubramanyan and J. Perez-Carballo, "Multiword expression filtering for building knowledge," in Proceedings of the 2nd ACL Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain, 2004, pp. 40-47.
  2. Doucet and H. Ahonen-Myka, "Non-contiguous word sequences for information retrieval," in Proceedings of the 2nd ACL Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain, 2004, pp 88-95.
  3. S. Venkatapathy and A. Joshi, "Using information about multiword expressions for the word-alignment task," in Proceedings of the COLING/ACL Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, 2006, pp. 20-27.
  4. T. Baldwin and T. Tanaka, "Translation by machine of complex nominals: Getting it right," in Proceedings of the 2nd ACL Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain, 2004, pp. 24-31.
  5. K. Uchiyama, T. Baldwin, and S. Ishizaki, "Disambiguating Japanese compound verbs," Computer Speech & Language, vol. 19, no. 4, pp. 497-512, 2005. https://doi.org/10.1016/j.csl.2005.02.001
  6. D. Bouamor, N. Semmar, and P. Zweigenbeaum, "Automatic construction of a multiword expressions bilingual lexicon: a statistical machine translation evaluation perspective," in Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon (CogALex-III), Mumbai, India, 2012, pp. 95-108.
  7. F. Smadja, K. McKeown, and V. Hatzivassiloglou, "Translating collocations for bilingual lexicons: a statistical approach," Computational Linguistics, vol. 22, no. 1, pp. 1-38, 1996.
  8. B. Daille, S. Dufour-Kowalski, and E. Morin, "French-English multi-word terms alignment based on lexical content analysis," in Proceedings of 4th International Conference on Language Resources and Evaluation (LREC2004), Lisbon, Portugal, 2004, pp. 919-922.
  9. D. Wu and X. Xia, "Learning an English-Chinese lexicon from a parallel corpus," in Proceedings of the 1st Conference on Association for Machine Translation in the Americas, Columbia, MD, 1994, pp. 206-213.
  10. B. Lu and B. K. Tsou, "Towards bilingual term extraction in comparable patents," in Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC2009), Hong Kong, 2009, pp. 755-762.
  11. H. W. Seo, H. S. Kwon, M. A. Cheon, and J. H. Kim, "Bilingual multiword lexicon construction via a pivot language," Journal of Contemporary Engineering Sciences, vol. 7, no. 23, pp. 1225-1233, 2014. https://doi.org/10.12988/ces.2014.49152
  12. K. W. Church and P. Hanks, "Word association norms, mutual information, and lexicography," Computational Linguistics, vol. 16, no. 1, pp. 22-29, 1990.
  13. P. Pecina, "A machine learning approach to multiword expression extraction," in Proceedings of the LREC Workshop towards a Shared Task for Multiword Expressions (MWE2008), Marrakech, Morocco, 2008, pp. 54-57.
  14. Villavicencio, V. Kordoni, Y. Zhang, M. Idiart, and C. Ramisch, "Validation and evaluation of automatically acquired multiword expressions for grammar engineering," in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language, Prague, Czech Republic, 2007, pp. 1034-1043.
  15. G. Bouma, "Collocation extraction beyond the independence assumption," in Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden, 2010, pp. 109-114.
  16. S. S. Piao, P. Rayson, D. Archer, and T. McEnery, "Comparing and combining a semantic tagger and a statistical tool for MWE extraction," Computer Speech and Language, vol. 19, no. 4, pp. 378-397, 2005. https://doi.org/10.1016/j.csl.2004.11.002
  17. G. Katz and E. Giesbrecht, "Automatic identification of non-compositional multiword expressions using latent semantic analysis," in Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, 2006, pp. 12-19.
  18. T. Baldwin, C. Bannard, T. Tanaka, and D. Widdows, "An empirical model of multiword expression decomposability," in Proceedings of the ACL 2003 Workshop on Multiword Expressions, Singapore, 2003, pp. 89-96.
  19. B. Daille, E. Gaussier, and J. M. Lange, "Towards automatic extraction of monolingual and bilingual terminology," in Proceeding of the 15th Conference on Computational Linguistics, Kyoto, Japan, 1994, pp. 515-521.
  20. Kunchukuttan, "Multiword expression recognition," Ph.D. dissertation, Indian Institute of Technology, Bombay, India, 2007.
  21. H. W. Seo, H. S. Kwon, and J. H. Kim, "Context-based lexicon extraction via a pivot language," in Proceeding of the 13th Conference on Pacific Association for Computational Linguistics (PACLING 2013), Tokyo, Japan, 2013.
  22. H. W. Seo, H. C. Kim, H. Y. Cho, J. H. Kim, and S. I. Yang, "Automatically constructing English-Korean parallel corpus from web documents," Journal of KIISE: Software and Applications, vol. 13, no. 2, pp. 161-164, 2006.
  23. P. Koehn, "Europarl: a parallel corpus for statistical machine translation," in Proceeding of the 10th Conference on Machine Translation Summit, Phuket, Thailand, 2005, pp. 79-86.
  24. J. C. Shin and C. Y. Ock. "A stage transition model for Korean part-of-speech and homograph tagging," Journal of KIISE: Software and Applications, vol. 39, no. 11, pp. 889-901, 2012.
  25. H. Schmid, "Probabilistic part-of-speech tagging using decision trees," in Proceedings of the Conference on New Methods in Language Processing, Manchester, UK, 1994, pp. 44-49.