DOI QR코드

DOI QR Code

Using Semantic Knowledge in the Uyghur-Chinese Person Name Transliteration

  • Murat, Alim (Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Science) ;
  • Osman, Turghun (Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Science) ;
  • Yang, Yating (Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Science) ;
  • Zhou, Xi (Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Science) ;
  • Wang, Lei (Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Science) ;
  • Li, Xiao (Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Science)
  • Received : 2016.04.21
  • Accepted : 2017.01.31
  • Published : 2017.08.31

Abstract

In this paper, we propose a transliteration approach based on semantic information (i.e., language origin and gender) which are automatically learnt from the person name, aiming to transliterate the person name of Uyghur into Chinese. The proposed approach integrates semantic scores (i.e., performance on language origin and gender detection) with general transliteration model and generates the semantic knowledge-based model which can produce the best candidate transliteration results. In the experiment, we use the datasets which contain the person names of different language origins: Uyghur and Chinese. The results show that the proposed semantic transliteration model substantially outperforms the general transliteration model and greatly improves the mean reciprocal rank (MRR) performance on two datasets, as well as aids in developing more efficient transliteration for named entities.

Acknowledgement

Supported by : Natural Science Foundation of Xinjiang

References

  1. R. E. Banchs, M. Zhang, X. Duan, H. Li, and A. Kumaran, "Report of NEWS 2015 machine transliteration shared task," in Proceedings of NEWS 2015: The 5th Named Entities Workshop, Beijing, China, 2015, pp. 10-23.
  2. W. J. Hutchins and H. L. Somers, An Introduction to Machine Translation (Vol. 362). London: Academic Press, 1992.
  3. R. W. Sproat, Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Dordrecht: Kluwer Academic Publishers, 1997.
  4. C. D. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008.
  5. K. Kaur and P. Singh, "Review of machine transliteration techniques," International Journal of Computer Applications, vol. 107, no. 20, pp. 13-16, 2014. https://doi.org/10.5120/18866-0061
  6. Y. Huang, M. Zhang, and C. L. Tan, "Nonparametric Bayesian machine transliteration with synchronous adaptor grammars," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, Portland, OR, 2011, pp. 534-539.
  7. M. Zhang, X. Duan, V. Pervouchine, and H. Li, "Machine transliteration: leveraging on third languages," in Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, 2010, pp. 1444-1452.
  8. Y. Jia, D. Zhu, and S. Yu, "A noisy channel model for grapheme-based machine transliteration," in Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Singapore, 2009, pp. 88-91.
  9. A. Finch and F. Sumita, "Phrase-based machine transliteration," in Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST), Hyderabad, India, 2008, pp. 13-18.
  10. T. Rama and K. Gali, "Modeling machine transliteration as a phrase based statistical machine translation problem," in Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Singapore, 2009, pp. 124-127.
  11. P. Shishtla, V. S. Ganesh, and S. Subramaniam, "A language-independent transliteration schema using character aligned models at NEWS 2009," in Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Singapore, 2009, pp. 40-43.
  12. C. Wutiwiwatchai and A. Thangthai, "Syllable-based Thai-English machine transliteration," in Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, 2010, pp. 66-70.
  13. J. H. Oh and K. S. Choi, "An ensemble of grapheme and phoneme for machine transliteration," in International Conference on Natural Language Processing. Heidelberg: Springer, 2005, pp. 450-461.
  14. M. Hagiwara and S. Sekine, "Latent semantic transliteration using Dirichlet mixture," in Proceedings of the 4th Named Entity Workshop, Jeju, Korea, 2012, pp. 30-37.
  15. L. Xu, A. Fujii, and T. Ishikawa, "Modeling impression in probabilistic transliteration into Chinese," in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 2006, pp. 242-249.
  16. H. Li, K. C. Sim, J. S. Kuo, and M. Dong, "Semantic transliteration of personal names," in Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, 2007, pp. 120-127.
  17. X. Duan, R. E. Banchs, M. Zhang, H. Li, and A. Kumaran, "Report of NEWS 2016 machine transliteration shared task," in Proceedings of NEWS 2016: The 6th Named Entities Workshop, Berlin, Germany, 2016, pp. 58-72.
  18. X. Jiang, L. Sun, and D. Zhang, "A syllable-based name transliteration system," in Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Singapore, 2009, pp. 96-99.
  19. M. K. Chinnakotla and O. P. Damani, "Character sequence modeling for transliteration," in Proceedings of 7th International Conference on Natural Language Processing (ICON), Hyderabad, India, 2009, pp. 1-10.
  20. M. M. Khapra and P. Bhattacharyya, "Improving transliteration accuracy using word-origin detection and lexicon lookup," in Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Singapore, 2009, pp. 84-87.
  21. H. Surana and A. K. Singh, "A more discerning and adaptable multilingual transliteration mechanism for Indian languages," in Proceedings of the 3rd International Joint Conference on Natural Language Processing, Hyderabad, India, 2008, pp. 64-71
  22. A. Murat, A. Yusup, and Y. Abaydulla, "Research and implementation of the uyghur-chinese personal name transliteration based on syllabification," in Proceedings of 2013 International Conference on Asian Language Processing, Urumqi, China, 2013, pp. 71-74.
  23. H. Li, A. Kumaran, M. Zhang, and V. Pervouchine, "Whitepaper of NEWS 2009 machine transliteration shared task," in Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Singapore, 2009, pp. 19-26.