A Clustering Method using Dependency Structure and Part-Of-Speech(POS) for Japanese-English Statistical Machine Translation

Kim, Han-Kyong;Na, Hwi-Dong;Lee, Jin-Ji;Lee, Jong-Hyeok;

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Volume 15 Issue 12
/
Pages.993-997
/
2009
/
1229-7712(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

A Clustering Method using Dependency Structure and Part-Of-Speech(POS) for Japanese-English Statistical Machine Translation

일영 통계기계번역에서 의존문법 문장 구조와 품사 정보를 사용한 클러스터링 기법

김한경 (포항공과대학교 컴퓨터공학과) ;
나휘동 (포항공과대학교 컴퓨터공학과) ;
이금희 (포항공과대학교 컴퓨터공학과) ;
이종혁 (포항공과대학교 컴퓨터공학과)

Published : 2009.12.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Clustering is well known method and that can be used in statistical machine translation. In this paper we propose a corpus clustering method using syntactic structure and POS information of dependency grammar. And using this cluster language model as additional feature to phrased-based statistical machine translation system to improve translation Quality.

클러스터링 기법은 다양한 분야에서 이용되어 왔으며, 통계 기반 기계번역에서도 익히 사용된 기법이다. 그러나 기존의 연구에서는 깊이 있는 문법적인 분석 없이 기계학습 기법을 사용하거나, 문장구조의 정보를 사용하더라도 정규식을 이용하여 판별하는 선에서 그치는 경우가 많았다. 본 논문에서는 각 문장의 의존관계 문법에 따른 구조와 조사 등의 품사 정보를 사용하여 문장구조를 파악하고 유형별로 분류하여 각각에 특화된 언어모델을 획득하는 방법과, 이를 구 기반 통계기계번역에 추가적인 정보로 사용하여 번역성능을 향상하는 데 이용하는 방법을 제안한다.

Keywords

References

Hirofumi Yamamoto and Eiichiro Sumita : "Bilin-gual cluster based models for statistical machine translation," Proceedings of the 2007 Joint Con-ference on Empirical Methods in Natural Langu-age Processing and Computational Natural Language Learning, pp.514-523, June 28-30, 2007
Takeshi Ito, Tomoyosi Akiba and Katunobu Itou : "Effect of the Topic Dependent Translation Models for Patent Translation - Experiment at NTCTR-7," Proceedings of NTCIR-7 Workshop Meeting, pp.425-429, December 16-19, 2008
Sasa Hasan and Hermann Ney : “Clustered Lan-guage Models based on Regular Expressions for SMT” 10th EAMT conference "Practical applica-tions of machine translation," pp.119-125, 30-31 May 2005
Keiji Yasuda, Andrew Finch and Hideo Okuma : "System Description of NiCT-ATR SMT for NTCIR-7," Proceedings of NTCIR-7 Workshop Meeting, pp.415-419, December 16-19, 2008
Jin-ji Li, Hwi-dong Na, Hankyong Kim, Chang-Hu Jin and Jong-Hyeok Lee : "The POSTECH Statistical Machine Translation Systems for NTCIR-7 Patent Translation Task," Proceedings of NTCIR-7 Workshop Meeting, pp.445-449, December 16-19, 2008
Atsushi Fujii, Masao Utiyama, Mikio Yamamoto and Takehito Utsuro : “Overview of the Patent Translation Task at the ntcir-7 Workshop,” Proceedings of NTCIR-7 Workshop Meeting, pp.389-400, December 16-19, 2008
Taku Kudo and Yuji Matsumoto : “Fast Methods for Kernel-based Text Analysis,” Proceedings of ACL-2003, pp.24-31, 7-12 July 2003 Available at http://www.chasen.org/~taku/software/cabocha/$
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola BertoIdi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin and Evan Herbst : "Moses: Open Source Toolkit for Statistical Machine Tran-slation,” Annual Meeting of the Association for Computational Linguistics (ACL), 2007
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. “BLEU: A method for automatic evaluation of Machine Translation,” Techncal Report RC22176, IBM. 2001
Andreas Stolcke. : “SRILM - an extensible lan-guage modeling toolkit,” In Proc. of the 7th Inter-national Conference on Spoken Language Processing (ICSLP). pp.693-696, 2002

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

A Clustering Method using Dependency Structure and Part-Of-Speech(POS) for Japanese-English Statistical Machine Translation

일영 통계기계번역에서 의존문법 문장 구조와 품사 정보를 사용한 클러스터링 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)