• Title/Summary/Keyword: Hybrid machine translation

Search Result 11, Processing Time 0.029 seconds

Classification-Based Approach for Hybridizing Statistical and Rule-Based Machine Translation

  • Park, Eun-Jin;Kwon, Oh-Woog;Kim, Kangil;Kim, Young-Kil
    • ETRI Journal
    • /
    • v.37 no.3
    • /
    • pp.541-550
    • /
    • 2015
  • In this paper, we propose a classification-based approach for hybridizing statistical machine translation and rulebased machine translation. Both the training dataset used in the learning of our proposed classifier and our feature extraction method affect the hybridization quality. To create one such training dataset, a previous approach used auto-evaluation metrics to determine from a set of component machine translation (MT) systems which gave the more accurate translation (by a comparative method). Once this had been determined, the most accurate translation was then labelled in such a way so as to indicate the MT system from which it came. In this previous approach, when the metric evaluation scores were low, there existed a high level of uncertainty as to which of the component MT systems was actually producing the better translation. To relax such uncertainty or error in classification, we propose an alternative approach to such labeling; that is, a cut-off method. In our experiments, using the aforementioned cut-off method in our proposed classifier, we managed to achieve a translation accuracy of 81.5% - a 5.0% improvement over existing methods.

A Corpus-based Hybrid Translation System for Limited Domain (제한된 도메인을 위한 코퍼스 기반의 하이브리드 번역 시스템)

  • Kang, Un-Gu;Kim, Sung-Hyun;Lee, Byung-Mun;Lee, Young-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.11
    • /
    • pp.826-836
    • /
    • 2010
  • This paper proposes a hybrid machine translation system which integrates SMT, RBMT, and PBMT in serial manner. SMT in our project has been implemented as a Quasi-syntax-based system where monotone search is done, given a preprocessed string of foreign language. Preprocessing includes rule-based reordering, NE recognition, clausal splitting, and attaching pattern translation information at the end of the input text. For lengthy & complex sentences, clausal splitting turned out to generate better translation than normal input.

A Hybrid Method of Verb disambiguation in Machine Translation (기계번역에서 동사 모호성 해결에 관한 하이브리드 기법)

  • Moon, Yoo-Jin;Martha Palmer
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.3
    • /
    • pp.681-687
    • /
    • 1998
  • The paper presents a hybrid mcthod for disambiguation of the verb meaning in the machine translation. The presented verb translation algorithm is to perform the concept-based method and the statistics-based method simultaneously. It uses a collocation dictionary, WordNct and the statistical information extracted from corpus. In the transfer phase of the machine translation, it tries to find the target word of the source verb. If it fails, it refers to Word Net to try to find it by calculating word similarities between the logical constraints of the source sentence and those in the collocation dictionary. At the same time, it refers to the statistical information extracted from corpus to try to find it by calculating co-occurrence similarity knowledge. The experimental result shows that the algorithm performs more accurate verb translation than the other algorithms and improves accuracy of the verb translation by 24.8% compared to the collocation-based method.

  • PDF

A Hybrid Sentence Alignment Method for Building a Korean-English Parallel Corpus (한영 병렬 코퍼스 구축을 위한 하이브리드 기반 문장 자동 정렬 방법)

  • Park, Jung-Yeul;Cha, Jeong-Won
    • MALSORI
    • /
    • v.68
    • /
    • pp.95-114
    • /
    • 2008
  • The recent growing popularity of statistical methods in machine translation requires much more large parallel corpora. A Korean-English parallel corpus, however, is not yet enoughly available, little research on this subject is being conducted. In this paper we present a hybrid method of aligning sentences for Korean-English parallel corpora. We use bilingual news wire web pages, reading comprehension materials for English learners, computer-related technical documents and help files of localized software for building a Korean-English parallel corpus. Our hybrid method combines sentence-length based and word-correspondence based methods. We show the results of experimentation and evaluate them. Alignment results from using a full translation model are very encouraging, especially when we apply alignment results to an SMT system: 0.66% for BLEU score and 9.94% for NIST score improvement compared to the previous method.

  • PDF

An Analysis System of Prepositional Phrases in English-to-Korean Machine Translation (영한 기계번역에서 전치사구를 해석하는 시스템)

  • Gang, Won-Seok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.7
    • /
    • pp.1792-1802
    • /
    • 1996
  • The analysis of prepositional phrases in English-to Korean machine translation has problem on the PP-attachment resolution, semantic analysis, and acquisition of information. This paper presents an analysis system for prepositional phrases, which solves the problem. The analysis system consists of the PP-attachment resolution hybrid system, semantic analysis system, and semantic feature generator that automatically generates input information. It provides objectiveness in analyzing prepositional phrases with the automatic generation of semantic features. The semantic analysis system enables to generate natural Korean expressions through selection semantic roles of prepositional phrases. The PP-attachment resolution hybrid system has the merit of the rule-based and neural network-based method.

  • PDF

Sign Language Translation Using Deep Convolutional Neural Networks

  • Abiyev, Rahib H.;Arslan, Murat;Idoko, John Bush
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.631-653
    • /
    • 2020
  • Sign language is a natural, visually oriented and non-verbal communication channel between people that facilitates communication through facial/bodily expressions, postures and a set of gestures. It is basically used for communication with people who are deaf or hard of hearing. In order to understand such communication quickly and accurately, the design of a successful sign language translation system is considered in this paper. The proposed system includes object detection and classification stages. Firstly, Single Shot Multi Box Detection (SSD) architecture is utilized for hand detection, then a deep learning structure based on the Inception v3 plus Support Vector Machine (SVM) that combines feature extraction and classification stages is proposed to constructively translate the detected hand gestures. A sign language fingerspelling dataset is used for the design of the proposed model. The obtained results and comparative analysis demonstrate the efficiency of using the proposed hybrid structure in sign language translation.

Corpus-Based Ontology Learning for Semantic Analysis (의미 분석을 위한 말뭉치 기반의 온톨로지 학습)

  • 강신재
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.1
    • /
    • pp.17-23
    • /
    • 2004
  • This paper proposes to determine word senses in Korean language processing by corpus-based ontology learning. Our approach is a hybrid method. First, we apply the previously-secured dictionary information to select the correct senses of some ambiguous words with high precision, and then use the ontology to disambiguate the remaining ambiguous words. The mutual information between concepts in the ontology was calculated before using the ontology as knowledge for disambiguating word senses. If mutual information is regarded as a weight between ontology concepts, the ontology can be treated as a graph with weighted edges, and then we locate the least weighted path from one concept to the other concept. In our practical machine translation system, our word sense disambiguation method achieved a 9% improvement over methods which do not use ontology for Korean translation.

  • PDF

Korean-Chinese Person Name Translation for Cross Language Information Retrieval

  • Wang, Yu-Chun;Lee, Yi-Hsun;Lin, Chu-Cheng;Tsai, Richard Tzong-Han;Hsu, Wen-Lian
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.489-497
    • /
    • 2007
  • Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages, Chinese uses characters (ideographs), which makes person name translation difficult because one syllable may map to several Chinese characters. We propose an effective hybrid person name translation method to improve the performance of KCIR. First, we use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. Second, we adopt the Naver people search engine to find the query name's Chinese or English translation. Third, we extract Korean-English transliteration pairs from Google snippets, and then search for the English-Chinese transliteration in the database of Taiwan's Central News Agency or in Google. The performance of KCIR using our method is over five times better than that of a dictionary-based system. The mean average precision is 0.3490 and the average recall is 0.7534. The method can deal with Chinese, Japanese, Korean, as well as non-CJK person name translation from Korean to Chinese. Hence, it substantially improves the performance of KCIR.

  • PDF

Comparison Thai Word Sense Disambiguation Method

  • Modhiran, Teerapong;Kruatrachue, Boontee;Supnithi, Thepchai
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1307-1312
    • /
    • 2004
  • Word sense disambiguation is one of the most important problems in natural language processing research topics such as information retrieval and machine translation. Many approaches can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledge-based, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy. The purpose of this paper is to compare three famous machine learning techniques, Snow, SVM and Naive Bayes in Word-Sense Disambiguation on Thai language. 10 ambiguous words are selected to test with word and POS features. The results show that SVM algorithm gives the best results in solving of Thai WSD and the accuracy rate is approximately 83-96%.

  • PDF

A Hybrid N-best Part-of-Speech Tagger for English-Korean Machine Translation (영한 기계 번역을 위한 혼합형 N-best 품사 태거)

  • Lim, Heui-Seok;Kwon, Cheol-Joong;Lee, Jae-Won;Oh, Ki-Eun
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.15-19
    • /
    • 1998
  • 기계 번역 시스템에서 품사 태거의 오류는 전체번역 정확률에 결정적인 영향을 미친다. 따라서 어휘 단계의 정보만으로는 중의성 해소가 불가능한 단어에 대해서는 중의성 해소에 충분한 정보를 얻을 수 있는 구문 분석이나 의미 분석 단계까지 완전한 중의성 해소를 유보하는 N-best 품사 태거가 요구된다. 또한 N-best 품사 태거는 단어에 할당되는 평균 품사 개수를 최소화함으로써 상위 단계의 부하를 줄이는 본연의 역할을 수행하여야 한다. 본 논문은 통계 기반 품사 태깅 방법을 이용하여 N-best 후보를 선정하고, 선정된 N-best 후보에 언어 규칙을 적용하여 중의성을 감소시키거나 오류를 보정하는 혼합형 N-best 품사 태깅 방법을 제안한다 제안된 N-best 품사 태거는 6만여 단어의 영어 코퍼스에서 실험한 결과, 단어 당 평균 1.09개의 품사를 할당할 때 0.43%의 오류율을 보인다.

  • PDF