• Title/Summary/Keyword: Language Translation

Search Result 559, Processing Time 0.023 seconds

A Study on the Korean Parts-of-Speech for Korean-English Machine Translation (기계번역용 한국어 품사에 관한 연구)

  • 송재관;박찬곤
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.4
    • /
    • pp.48-54
    • /
    • 2000
  • This Paper classified korean Parts-of-speech for korean-english machine translation and investigated morphological characters of each parts-of-speech. Korean standard grammar classified parts-of-speech by semantic, functional and formal character. Many rules make a difficulties the understanding of grammar structure and parts-of-speech classification and it is necessary to preprocess at machine translation. This paper classified korean parts-of-speech by one rule. The parts-of-speech suggested in this paper have a same syntactic role and same parts-of-speech with english dictionary, and express the structure of korean sentence. And also it can make target language by pattern matching in korean-english translation.

  • PDF

DaHae: Japanese Morphological Analyzer for Japanese to Korean Machine Translation (DaHae: 일한 기계번역을 위한 일본어 형태소 분석기)

  • Yuh, Sang-Hwa;Jung, Han-Min;Chang, Won;Kim, Tae-Wan;Hwang, Do-Sam;Park, Dong-In
    • Annual Conference on Human and Language Technology
    • /
    • 1995.10a
    • /
    • pp.195-207
    • /
    • 1995
  • 일본어는 한자, 히라가나, 가다가나 등 다양한 종류의 문자를 사용하며 이들의 혼용 비율이 매우 높아 띄어쓰기를 하지 않아도 문서의 가독성을 유지한다. ICOT 사전, EDR 사전, ATLAS I/JK사전 등 기존의 전자 사전에서 복합 자종의 표제어가 차지하는 비율(한자+히라가나의 표제어 제외)은 평균 8.8%로 그 수가 매우 작다. 따라서, 문장 내에서 자종의 변화는 단어를 구분하는 하나의 delimiter로 이용될 수 있다. 본 시스템에서는 형태소 분석의 전단계로 전처리기를 두어 자종정보(character type information)에 의한 fragment 분리 및 예외 단어, 정형표현 처리를 수행하며 각 fragment 의 형태소 분석 방법을 제시한다. 형태소 분석기는 전처리기의 처리 결과를 입력받아 각각의 fragment를 전처리기가 제시한 분석 방법에 따라 분석하여 입력 문장의 가능한 모든 분석을 추출한다. 이 방법은 불필요한 사전 탐색과 접속 체크 회수를 줄여 분석 성능을 향상시킨다.

  • PDF

A Study of Fine Tuning Pre-Trained Korean BERT for Question Answering Performance Development (사전 학습된 한국어 BERT의 전이학습을 통한 한국어 기계독해 성능개선에 관한 연구)

  • Lee, Chi Hoon;Lee, Yeon Ji;Lee, Dong Hee
    • Journal of Information Technology Services
    • /
    • v.19 no.5
    • /
    • pp.83-91
    • /
    • 2020
  • Language Models such as BERT has been an important factor of deep learning-based natural language processing. Pre-training the transformer-based language models would be computationally expensive since they are consist of deep and broad architecture and layers using an attention mechanism and also require huge amount of data to train. Hence, it became mandatory to do fine-tuning large pre-trained language models which are trained by Google or some companies can afford the resources and cost. There are various techniques for fine tuning the language models and this paper examines three techniques, which are data augmentation, tuning the hyper paramters and partly re-constructing the neural networks. For data augmentation, we use no-answer augmentation and back-translation method. Also, some useful combinations of hyper parameters are observed by conducting a number of experiments. Finally, we have GRU, LSTM networks to boost our model performance with adding those networks to BERT pre-trained model. We do fine-tuning the pre-trained korean-based language model through the methods mentioned above and push the F1 score from baseline up to 89.66. Moreover, some failure attempts give us important lessons and tell us the further direction in a good way.

A Contemplation on Language Fusion Phenomenon of Chinese Neologism Derived from Korean (한국어 차용 중국어 신조어의 언어융합 현상 고찰)

  • JUNG, EUN
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2022
  • No language can be separated from other languages and exist independently. When a language comes in contact with a foreign culture, they continuously affect each other and bring changes. Hallyu boom(Korean wave), which was derived from the emergence of K-drama and K-pop due to rapid developments in global scientific technologies and digitization after the 90's, affected the Chinese language. As a result, neologisms that are derived from the Korean language are being commonly used for making exchanges and becoming social buzzwords. Neologisms derived from Korean reflect the effects and results of language contact between the two languages. We examined the background and cause of Chinese neologisms derived from Korean based on the sociocultural factors and psychological necessity, and explained neologisms by using four categories of transliteration, liberal translation, borrowing Korean-Chinese characters and others. Despite having the issue of being anti-normative during the process of coining new words, neologism enriches Chinese expressions and is a mirror for social culture that reflects the opinions and understandings of young Chinese people who pursue novelty, change, innovation and creativity in linguistic aspects. We hope that it will serve as an opportunity for the young people in Korea and China to change their perceptions and become more friendly by understanding each other's language, culture and by communicating. We also expect to provide assistance in regard to teaching and learning the applications of Korean-Chinese language fusion at Chinese education fields.

Several Legal Issues on Arbitration Agreement under the New York Convention Raised by the Recent Supreme Court Decision of Korea of December 10, 2004 (국제상사중재에서의 중재합의에 관한 법적 문제점 -대법원 2004, 12. 10. 선고 2004다20180 판결 이 제기한 뉴욕협약상의 쟁점들을 중심으로-)

  • Suk Kwang-Hyun
    • Journal of Arbitration Studies
    • /
    • v.15 no.2
    • /
    • pp.225-261
    • /
    • 2005
  • Under Article IV of the United Nations Convention on the Recognition and Enforcement of Foreign Arbitral Awards (New York Convention), in order to obtain the recognition and enforcement of a foreign arbitral award, a party applying for recognition and enforcement of a foreign arbitral award shall supply (a) the duly authenticated original award or a duly certified copy thereof and (b) the original arbitration agreement or a duly certified copy thereof. In addition, if the arbitral award or arbitration agreement is not made in an official language of the country in which the award is relied upon, the party applying for recognition and enforcement of the award shall produce a translation of these documents into such language, and the translation shall be certified by an official or sworn translator or by a diplomatic or consular agent. In a case where a Vietnamese company which had obtained a favorable arbitral award in Vietnam applied for recognition and enforcement of a Vietnamese arbitral award before a Korean court, the recent Korean Supreme Court Judgment (Docket No. 2004 Da 20180. 'Judgment') rendered on December 12, 2004 has alleviated the document requirements as follows : The Judgment held that (i) the party applying for recognition andenforcement of a foreign arbitral award does not have to strictly comply with the document requirements when the other party does not dispute the existence and the content of the arbitral award and the arbitration agreement and that (ii) in case the translation submitted to the court does not satisfy the requirement of Article 4, the court does not have to dismiss the case on the ground that the party applying for recognition and enforcement of a foreign arbitral award has failed to comply with the translation requirement under Article 4, and instead may supplement the documents by obtaining an accurate Korean translation from an expert translator at the expense of the party applying for recognition and enforcement of the foreign arbitral award. In this regard, the author fully supports the view of the Judgment. Finally, the Judgment held that, even though the existence of a written arbitration agreement was not disputed at the arbitration, there was no written arbitration agreement between the plaintiff and the defendant and wenton to repeal the judgment of the second instance which admitted the existence of a written arbitration agreement between the parties. In this regard, the author does not share the view of the Judgment. The author believes that considering the trend of alleviating the formality requirement of arbitration agreements under Article 2 of the New York Convention, the Supreme Court could have concluded that there was a written arbitration agreement because the defendant participated in thearbitration proceedings in Vietnam without disputing the formality requirement of the arbitration agreement. Or the Supreme Court should have taken the view that the defendant was no longer permitted to dispute the formality requirement of the arbitration agreement because otherwise it would be clearly against the doctrine of estoppel.

  • PDF

Parsing the Wh-Interrogative Construction in Korean

  • Yang, Jaehyung;Kim, Jong-Bok
    • Language and Information
    • /
    • v.17 no.2
    • /
    • pp.51-66
    • /
    • 2013
  • Korean is a wh-in-situ language where the wh-expression stays in situ with an obligatory Q-particle marking its interrogative scope. This paper briefly reviews some basic properties of the wh-question construction in Korean and shows how a typed feature structure grammar, HPSG (Pollard and Sag 1994, Sag et al. 2003), together with the notions of 'type hierarchy' and 'constructions', can provide a robust basis for parsing the wh-construction in the language. We show that this system induces robust syntactic structures as well as enriched semantic representations for real-time applications such as machine translation, which require deep processing of the phenomena concerned.

  • PDF

Multi-Lingual Spoken Language Translation System using CSTAR-IF (CSTAR-IF를 이용한 다국어 대화체 번역시스템)

  • Choi, Un-Cheon
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.159-163
    • /
    • 1998
  • 다국어 대화체 번역 시스템은 미국의 카네기 멜론 대학과 일본의 ATR 및 한국의 전자통신연구원 등이 가입한 CSTAR의 99년 국제간 음성언어번역 시스템 데모를 위한 한국어측 번역 시스템이다. CSTAR-IF는 국제간 데모를 위해 각 국의 시스템끼리 주고 받는 정보의 단위 혹은 형태로서, 중간언어 표현의 한 가지 방법으로 간단하면서도 단순한 표현으로 특정 영역 내에 나타나는 의미를 표현할 수 있도록 정의되었다. 다국어 번역 시스템은 크게 두 가지로 나누어진다. 하나는 한국어 음성인식 결과를 IF로 변환하는 해석 시스템이고, 다른 하나는 IF로부터 한국어 문장을 생성하여 음성으로 들려주는 생성 시스템이다. 한국어 해석 시스템은 현재 92%의 해석 성공률을, 생성 시스템은 98%의 생성 성공률을 보이고 있다.

  • PDF

Korean Analysis and Transfer in Unification-based Multilingual Machine Translation System (통합기반 다국어 자동번역 시스템에서의 한국어 분석과 변환)

  • Choi, Sung-Kwon;Park, Dong-In
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.301-307
    • /
    • 1996
  • 다국어 자동번역이란 2개국어 이상 언어들간의 번역을 말한다. 기존의 다국어 자동번역 시스템은 크게 변환기반 transfer-based 방식과 피봇방식으로 분류될 수 있는데 변환기반 다국어 자동번역 시스템에서는 각 언어의 분석과 생성 규칙이 상이하게 작성됨으로써 언어들간의 공통성이 수용되지 못하였고 그로 인해 전체 번역 메모리의 크기가 증가하는 결과를 초래하였었다. 또한 기존의 피봇방식에서는 다국어에 적용될 수 있는 언어학적 보편성 모델을 구현하는 어려움이 있었다. 이러한 기존의 다국어 자동번역 시스템의 단점들을 극복하기 위해 본 논문에서는 언어들간의 공통성을 수용하며 또한 여러 언어에서 공유될 수 있는 공통 규칙에 의한 다국어 자동번역 시스템을 제안하고자 한다. 공통 규칙의 장점은 전산학적으로는 여러 언어에서 단지 한번 load 되기 때문에 전체 번역 메모리의 크기를 줄일 수 있다는 것과 언어학적으로는 문법 정보의 작성.수정.관리의 일관성을 유지할 수 있다는 것이다.

  • PDF

An Analysis of Feasibility of Sentence Frame Based Method for Korean to English Translation System (한영 번역 시스템을 위한 문틀 기반 번역 방식의 실현성 분석)

  • Kim, Young-Kil;Seo, Young-Ae;Seo, Kwang-Jun;Choi, Sung-Kwon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10a
    • /
    • pp.261-264
    • /
    • 2000
  • 지금까지의 한영 번역 방식은 규칙 기반 방식이 주를 이루었지만 현재 패턴을 이용한 번역 방식이 활발히 연구되고 있다. 그러나 패턴 기반 방식은 그 적용성(Coverage)에 대한 치명적인 단점을 지닌다. 따라서 본 논문에서는 한국어 패턴을 어절 단위의 일반 문틀과 동사구를 중심으로 하는 용언중심의 문틀로 나누어 각 패턴들에 대한 적용성 및 실현성을 조사한다. 실험은 기존의 형태소 분석기를 이용하여 방송 자막 문장 351,806 문장을 대상으로 자동으로 문틀을 구축하여 4,995 문장의 테스트 데이터에 대한 적용성 검사를 실시하였다. 즉 본 논문에서는 방송 자막 문장을 대상으로 한영번역을 위한 일반 문틀 및 용언 중심의 문틀 방식의 적용성을 조사하여 문틀 기반 방식의 실현성을 평가하고 앞으로의 한영 번역 시스템 개발 방향을 제시한다.

  • PDF

A Machine Independent Automatic Microcode Generation (머신 독립적인 마이크로코드 자동 생성)

  • Park, B.S.;Min, K.C.;Kim, Y.J.;Lee, S.J.;Lim, I.C.
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.651-654
    • /
    • 1988
  • This paper proposes a microcode generating system which automatically generates the microcode of various target machine by inputing the intermediate language (MDIL) from the machine independent HLML-C (High Level Microprograming Language C) language. The MOP's (Microoperations) which is modeled 7-tuples generate to extend MDIL by table driven method with the information of translation table for each target machine. As compaction being considered and the hardware resource of target machine used, the conflicts of hardware elements are removed possibly. This proposed system is implemented with C language and yacc on VAX-11/750 (UNIX 4.3 BSD).

  • PDF