A Clustering Method using Dependency Structure and Part-Of-Speech(POS) for Japanese-English Statistical Machine Translation

일영 통계기계번역에서 의존문법 문장 구조와 품사 정보를 사용한 클러스터링 기법

  • 김한경 (포항공과대학교 컴퓨터공학과) ;
  • 나휘동 (포항공과대학교 컴퓨터공학과) ;
  • 이금희 (포항공과대학교 컴퓨터공학과) ;
  • 이종혁 (포항공과대학교 컴퓨터공학과)
  • Published : 2009.12.15

Abstract

Clustering is well known method and that can be used in statistical machine translation. In this paper we propose a corpus clustering method using syntactic structure and POS information of dependency grammar. And using this cluster language model as additional feature to phrased-based statistical machine translation system to improve translation Quality.

클러스터링 기법은 다양한 분야에서 이용되어 왔으며, 통계 기반 기계번역에서도 익히 사용된 기법이다. 그러나 기존의 연구에서는 깊이 있는 문법적인 분석 없이 기계학습 기법을 사용하거나, 문장구조의 정보를 사용하더라도 정규식을 이용하여 판별하는 선에서 그치는 경우가 많았다. 본 논문에서는 각 문장의 의존관계 문법에 따른 구조와 조사 등의 품사 정보를 사용하여 문장구조를 파악하고 유형별로 분류하여 각각에 특화된 언어모델을 획득하는 방법과, 이를 구 기반 통계기계번역에 추가적인 정보로 사용하여 번역성능을 향상하는 데 이용하는 방법을 제안한다.

Keywords

References

  1. Hirofumi Yamamoto and Eiichiro Sumita : "Bilin-gual cluster based models for statistical machine translation," Proceedings of the 2007 Joint Con-ference on Empirical Methods in Natural Langu-age Processing and Computational Natural Language Learning, pp.514-523, June 28-30, 2007
  2. Takeshi Ito, Tomoyosi Akiba and Katunobu Itou : "Effect of the Topic Dependent Translation Models for Patent Translation - Experiment at NTCTR-7," Proceedings of NTCIR-7 Workshop Meeting, pp.425-429, December 16-19, 2008
  3. Sasa Hasan and Hermann Ney : “Clustered Lan-guage Models based on Regular Expressions for SMT” 10th EAMT conference "Practical applica-tions of machine translation," pp.119-125, 30-31 May 2005
  4. Keiji Yasuda, Andrew Finch and Hideo Okuma : "System Description of NiCT-ATR SMT for NTCIR-7," Proceedings of NTCIR-7 Workshop Meeting, pp.415-419, December 16-19, 2008
  5. Jin-ji Li, Hwi-dong Na, Hankyong Kim, Chang-Hu Jin and Jong-Hyeok Lee : "The POSTECH Statistical Machine Translation Systems for NTCIR-7 Patent Translation Task," Proceedings of NTCIR-7 Workshop Meeting, pp.445-449, December 16-19, 2008
  6. Atsushi Fujii, Masao Utiyama, Mikio Yamamoto and Takehito Utsuro : “Overview of the Patent Translation Task at the ntcir-7 Workshop,” Proceedings of NTCIR-7 Workshop Meeting, pp.389-400, December 16-19, 2008
  7. Taku Kudo and Yuji Matsumoto : “Fast Methods for Kernel-based Text Analysis,” Proceedings of ACL-2003, pp.24-31, 7-12 July 2003 Available at http://www.chasen.org/~taku/software/cabocha/$
  8. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola BertoIdi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin and Evan Herbst : "Moses: Open Source Toolkit for Statistical Machine Tran-slation,” Annual Meeting of the Association for Computational Linguistics (ACL), 2007
  9. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. “BLEU: A method for automatic evaluation of Machine Translation,” Techncal Report RC22176, IBM. 2001
  10. Andreas Stolcke. : “SRILM - an extensible lan-guage modeling toolkit,” In Proc. of the 7th Inter-national Conference on Spoken Language Processing (ICSLP). pp.693-696, 2002