• Title/Summary/Keyword: Model Translation

Search Result 471, Processing Time 0.019 seconds

Simultaneous neural machine translation with a reinforced attention mechanism

  • Lee, YoHan;Shin, JongHun;Kim, YoungKil
    • ETRI Journal
    • /
    • 제43권5호
    • /
    • pp.775-786
    • /
    • 2021
  • To translate in real time, a simultaneous translation system should determine when to stop reading source tokens and generate target tokens corresponding to a partial source sentence read up to that point. However, conventional attention-based neural machine translation (NMT) models cannot produce translations with adequate latency in online scenarios because they wait until a source sentence is completed to compute alignment between the source and target tokens. To address this issue, we propose a reinforced learning (RL)-based attention mechanism, the reinforced attention mechanism, which allows a neural translation model to jointly train the stopping criterion and a partial translation model. The proposed attention mechanism comprises two modules, one to ensure translation quality and the other to address latency. Different from previous RL-based simultaneous translation systems, which learn the stopping criterion from a fixed NMT model, the modules can be trained jointly with a novel reward function. In our experiments, the proposed model has better translation quality and comparable latency compared to previous models.

통계 정보를 이용한 전치사 최적 번역어 결정 모델 (A Statistical Model for Choosing the Best Translation of Prepositions.)

  • 심광섭
    • 한국언어정보학회지:언어와정보
    • /
    • 제8권1호
    • /
    • pp.101-116
    • /
    • 2004
  • This paper proposes a statistical model for the translation of prepositions in English-Korean machine translation. In the proposed model, statistical information acquired from unlabeled Korean corpora is used to choose the best translation from several possible translations. Such information includes functional word-verb co-occurrence information, functional word-verb distance information, and noun-postposition co-occurrence information. The model was evaluated with 443 sentences, each of which has a prepositional phrase, and we attained 71.3% accuracy.

  • PDF

어휘 번역확률과 질의개념연관도를 반영한 검색 모델 (Retrieval Model Based on Word Translation Probabilities and the Degree of Association of Query Concept)

  • 김준길;이경순
    • 정보처리학회논문지B
    • /
    • 제19B권3호
    • /
    • pp.183-188
    • /
    • 2012
  • 정보 검색에서 성능 저하의 주요 요인은 사용자의 질의와 검색 문서 사이에서의 어휘 불일치 때문이다. 어휘 불일치 문제를 해결하기 위해 본 논문에서는 어휘 번역확률을 이용한 번역기반 언어모델에 질의개념연관도를 반영한 검색 모델을 제안한다. 어휘관계 정보를 획득하기 위하여 문장-다음문장 쌍을 이용하여 어휘 번역확률을 계산하였다. 제안모델의 유효성을 검증하기 위해 TREC AP 컬렉션에 대해 실험하였다. 실험결과에서 제안모델이 언어모델에 비해 아주 우수한 성능향상을 보였고, 번역기반 언어모델에 비해서도 높은 성능을 나타냈다.

Optimized Chinese Pronunciation Prediction by Component-Based Statistical Machine Translation

  • Zhu, Shunle
    • Journal of Information Processing Systems
    • /
    • 제17권1호
    • /
    • pp.203-212
    • /
    • 2021
  • To eliminate ambiguities in the existing methods to simplify Chinese pronunciation learning, we propose a model that can predict the pronunciation of Chinese characters automatically. The proposed model relies on a statistical machine translation (SMT) framework. In particular, we consider the components of Chinese characters as the basic unit and consider the pronunciation prediction as a machine translation procedure (the component sequence as a source sentence, the pronunciation, pinyin, as a target sentence). In addition to traditional features such as the bidirectional word translation and the n-gram language model, we also implement a component similarity feature to overcome some typos during practical use. We incorporate these features into a log-linear model. The experimental results show that our approach significantly outperforms other baseline models.

English-Korean speech translation corpus (EnKoST-C): Construction procedure and evaluation results

  • Jeong-Uk Bang;Joon-Gyu Maeng;Jun Park;Seung Yun;Sang-Hun Kim
    • ETRI Journal
    • /
    • 제45권1호
    • /
    • pp.18-27
    • /
    • 2023
  • We present an English-Korean speech translation corpus, named EnKoST-C. End-to-end model training for speech translation tasks often suffers from a lack of parallel data, such as speech data in the source language and equivalent text data in the target language. Most available public speech translation corpora were developed for European languages, and there is currently no public corpus for English-Korean end-to-end speech translation. Thus, we created an EnKoST-C centered on TED Talks. In this process, we enhance the sentence alignment approach using the subtitle time information and bilingual sentence embedding information. As a result, we built a 559-h English-Korean speech translation corpus. The proposed sentence alignment approach showed excellent performance of 0.96 f-measure score. We also show the baseline performance of an English-Korean speech translation model trained with EnKoST-C. The EnKoST-C is freely available on a Korean government open data hub site.

언어적 특성과 서비스를 고려한 딥러닝 기반 한국어 방언 기계번역 연구 (Deep Learning-based Korean Dialect Machine Translation Research Considering Linguistics Features and Service)

  • 임상범;박찬준;양영욱
    • 한국융합학회논문지
    • /
    • 제13권2호
    • /
    • pp.21-29
    • /
    • 2022
  • 본 논문은 방언 연구, 보존, 의사소통의 중요성을 바탕으로 소외될 수 있는 방언 사용자들을 위한 한국어 방언 기계번역 연구를 진행하였다. 사용한 방언 데이터는 최상위 행정구역을 기반으로 배포된 AIHUB 방언 데이터를 사용하였다. 방언 데이터를 바탕으로 Transformer 기반의 copy mechanism을 적용하여 방언 기계번역기의 성능 향상을 도모하는 모델링 연구와 모델 배포의 효율성을 도모하는 Many-to-one 기반의 방언 기계 번역기를 제안한다. 본 논문은 one-to-one 모델과 many-to-one 모델의 성능을 비교 분석하고 이를 다양한 언어학적 시각으로 분석하였다. 실험 결과 BLEU점수를 기준으로 본 논문이 제안하는 방법론을 적용한 one-to-one 기계번역기의 성능 향상과 many-to-one 기계번역기의 유의미한 성능을 도출하였다.

SPARQL-to-SQL 변환 알고리즘의 저장소 독립적 활용을 위한 시스템 모델 (A System Model for Storage Independent Use of SPARQL-to-SQL Translation Algorithm)

  • 손지성;정동원;백두권
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제14권5호
    • /
    • pp.467-471
    • /
    • 2008
  • 웹 온톨로지에 대한 연구가 활발해지면서 웹 온톨로지를 저장하기 위한 다양한 형태의 저장소와 질의 언어가 개발되고 있다. SPARQL의 이용이 증가하고 대부분 관계형 데이타베이스 기반의 저장소를 이용함에 따라 SPARQL을 SQL로 변환하는 알고리즘 개발의 필요성이 대두되었다. 지금까지 제안된 변환 알고리즘들은 SPARQL의 일부만을 SQL로 변환하거나 변환 알고리즘이 저장소 구조에 종속적이라는 문제점이 있다. 이 논문에서는 저장소에 독립적으로 특정 변환 알고리즘을 활용할 수 있는 모델을 제안한다.

영한기계번역과 대용어 조응문제에 대한 고찰 (English-to-Korean Machine Translation and the Problem of Anaphora Resolution)

  • Ruslan Mitkov
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 제11회 음성통신 및 신호처리 워크샵 논문집 (SCAS 11권 1호)
    • /
    • pp.351-357
    • /
    • 1994
  • At least two projects for English-to-Korean translation have been already in action for the last few years, but so far no attention has been paid to the problem of resolving pronominal reference and a default pronoun translation has been considered instead. In this paper we argue that pronous cannot be handled trivially in an English-to-Korean translation and one cannot bypass the task of resolving anaphoric reference if aiming at good and natural translation. In addition, we propose lexical transfer rules for English-to-Korean anaphor translation and outline an anaphora resolution model for an English-to-Korean MT system in operation.

  • PDF

번역: 대응과 평가 (Translation:Mapping and Evaluation)

  • 장석진
    • 한국언어정보학회지:언어와정보
    • /
    • 제2권1호
    • /
    • pp.1-41
    • /
    • 1998
  • Evaluation of multilingual translation fundamentally involves measurement of meaning equivalences between the formally mapped discourses/texts of SL(source language) and TL(target language) both represented by a metalanguage called IL(interlingua). Unlike a usaal uni-directional MT(machine translation) model(e.g.:SL $\rightarrow$ analysis $\rightarrow$ transfer $\rightarrow$ generation $\rightarrow$ TL), a bi-directional(by 'negotiation') model(i.e.: SL $\rightarrow$ IL/S $\leftrightarrow$ IL $\leftrightarrow$ IL/T \leftarrow TL) is proposed here for the purpose of evaluating multilingual, not merely bilingual, translation. The IL, as conceived of in this study, is an English-based predicate logic represented in the framework of MRS(minimal recursion semantics), an MT-oriented off-shoot of HPSG(Head-driven Phrase Structure Grammar). In addition, a list of semantic and pragmatic checkpoints are set up, some being optional depending on the kind and use of the translation, so sa to have the evaluation of translation fine-grained by computing matching or mismatching of such checkpoints.

  • PDF

Self-Attention 시각화를 사용한 기계번역 서비스의 번역 오류 요인 설명 (Explaining the Translation Error Factors of Machine Translation Services Using Self-Attention Visualization)

  • 장청롱;안현철
    • 한국IT서비스학회지
    • /
    • 제21권2호
    • /
    • pp.85-95
    • /
    • 2022
  • This study analyzed the translation error factors of machine translation services such as Naver Papago and Google Translate through Self-Attention path visualization. Self-Attention is a key method of the Transformer and BERT NLP models and recently widely used in machine translation. We propose a method to explain translation error factors of machine translation algorithms by comparison the Self-Attention paths between ST(source text) and ST'(transformed ST) of which meaning is not changed, but the translation output is more accurate. Through this method, it is possible to gain explainability to analyze a machine translation algorithm's inside process, which is invisible like a black box. In our experiment, it was possible to explore the factors that caused translation errors by analyzing the difference in key word's attention path. The study used the XLM-RoBERTa multilingual NLP model provided by exBERT for Self-Attention visualization, and it was applied to two examples of Korean-Chinese and Korean-English translations.