A Corpus-based Hybrid Translation System for Limited Domain

제한된 도메인을 위한 코퍼스 기반의 하이브리드 번역 시스템

  • 강운구 (가천의과학대학교 정보공학부) ;
  • 김성현 ((주)엘엔아이소프트 개발팀) ;
  • 이병문 (가천의과학대학교 정보공학부) ;
  • 이영호 (가천의과학대학교 정보공학부)
  • Received : 2010.01.11
  • Accepted : 2010.09.10
  • Published : 2010.11.15

Abstract

This paper proposes a hybrid machine translation system which integrates SMT, RBMT, and PBMT in serial manner. SMT in our project has been implemented as a Quasi-syntax-based system where monotone search is done, given a preprocessed string of foreign language. Preprocessing includes rule-based reordering, NE recognition, clausal splitting, and attaching pattern translation information at the end of the input text. For lengthy & complex sentences, clausal splitting turned out to generate better translation than normal input.

본 논문은 RBMT, SMT, PBMT를 활용한 직렬 연결 방식의 하이브리드 번역 시스템을 제안한다. 번역 시스템은 입력된 문장에 대하여 구문 분석을 진행한 후, 이 정보를 바탕으로 구문 변환과 개체명 인식을 한다. 이 결과값을 의사 문장으로 변형, 문장 분리 규칙이 적용 가능할 경우, 분리된 문장에 대하여 다중 디코딩을 수행하고, 후처리기에서 접합 규칙에 따라 번역문을 생성하였다. 실험을 통하여 어순 배치의 경우 distortion 모델에 의존하지 않고 구문 변환(rule-based syntactic transfer)규칙을 사용하는 것이 더욱 효과적인 것으로 나타났다.

Keywords

References

  1. P.F. Brown, J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, P.S. Roossin: A Statistical Approach to Machine Translation. Computational Linguistics, vol.16, no.2, pp.79-85, June 1990.
  2. A.L. Berger, S.A. Della Pietra, V.J. Della Pietra: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, vol.22, no.1, pp.39-72, March 1996.
  3. F.J. Och. Minimum Error Rate Training in Statistical Machine Translation. In Proceedings of ACL, pp.160-167, Morristown, NJ, USA, 2003.
  4. Gary Geunbae Lee, Jonghoon Lee, Donghyeon Lee, A transformation-based sentence splitting method for statistical machine translation. Proceedings of the IJCNLP2008 workshop on technologies and corpora for Asia-pacific speech translation, Hyderadad, Jan 2008.
  5. MichaelCollins, PhilippKoehn, and Ivona Kuc erova Clause Restructuring for Statistical Machine Translation. In Proceedings of ACL, pp.531-540, 2005.
  6. MichaelCollins, PhilippKoehn, and Ivona Kuc erova Clause Restructuring for Statistical Machine Translation. In Proceedings of ACL, pp.531-540, 2005.
  7. Koehn Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bo jar, Alexandra Constantin, and Evan Herbst. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of ACL Demo Session, pp.177-180, Prague, Czech Republic. 2007.
  8. D. Jurafsky, J.H. Martin: Speech and Language Processing. Prentice Hall, Englewood Cliffs, NJ, pp.890-893, 2000.
  9. Philipp Koehn, Statistical Machine Translation, pp. 252-256, pp. 256, Cambridge University Press, 2010.
  10. R. Kneser, H. Ney, Forming Word Classes by Statistical Clustering for Statistical Language Modeling. In 1. Quantitative Linguistics Conf., pp. 221-226, Trier, Germany, Sept. 1991.
  11. I.D. Melamed: Models of Translational Equivalence among Words., Computational Linguistics, vol.26, no.2, pp.221-249, 2000. https://doi.org/10.1162/089120100561683
  12. S. Vogel, H. Ney, C. Tillmann: HMM-based Word Alignment in Statistical Translation. In COLING '96: The 16th Int. Conf. on Computational Linguistics, pp.836-841, Copenhagen, Denmark, Aug. 1996.
  13. K. Knight. A Statistical Machine Translation Tutorial Workbook, 35 pages, Aug. 1999. http:// www.isi.edu/natural-language/mt/wkbk.rtf.
  14. Philipp Koehn, Franz Josef Och, and Daniel Marcu, Statistical Phrase-Based Translation, HLT/NAACL 2003.
  15. Ye-Yi Wang, Alex Waibel, Decoding Algorithm in Statistical Machine Translation, Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pp.366-372, Madrid, Spain, July 07-12, 1997.
  16. Thomas H. Cormen, Introduction to Algorithm, pp. 628-635, Ed. 2, The MIT Press, 2001.
  17. A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statist. Soc. Ser. B, vol.39, no.1, pp.1-22, 1977.
  18. Sergios Theodoridis, Pattern Recognition, pp.44-49, Academic Press; Ed. 4, 2008.
  19. K.A. Papineni, S. Roukos, T. Ward, W.J. Zhu. Bleu: A Method for Automatic Evaluation of Machine Translation. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, 10 pages, Sept. 2001.