• Title/Summary/Keyword: 축소 번역

Search Result 10, Processing Time 0.025 seconds

Performance Improvement of Extracting Bilingual Term from Phrase Table using Sentence Length Reduction (문장 길이 축소를 이용한 구 번역 테이블에서의 병렬어휘 추출 성능 향상)

  • Jeong, Seon-Yi;Lee, Kong-Joo
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.120-125
    • /
    • 2013
  • 본 연구는 대량의 특정 도메인 한영 병렬 말뭉치에서 통계 기반 기계 번역 시스템을 이용하여 병렬어휘를 효과적으로 추출해 낼 수 있는 방법에 관한 것이다. 통계 번역 시스템에서 어족이 다른 한국어와 영어간의 문장은 길이 및 어순의 차이로 인해 용어 번역 시 구절 번역 정확도가 떨어지는 문제점이 발생할 수 있다. 또한 문장 길이가 길어짐에 따라 이러한 문제는 더욱 커질 수 있다. 본 연구는 이러한 조건에서 문장의 길이가 축소된 코퍼스를 통해 한정된 코퍼스 자원 내 구 번역 테이블의 병렬어휘 추출 성능이 향상될 수 있도록 하였다.

  • PDF

Efficient Rule-based OWL Reasoning by Combing Meta Rules and Translation (메타 규칙과 번역의 혼용을 통한 규칙엔진 기반 OWL 추론 엔진의 성능 향상 방법)

  • Jang, Min-Su;Sohn, Joo-Chan;Cho, Young-Jo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06d
    • /
    • pp.214-219
    • /
    • 2007
  • 생성 규칙(Production Rule)과 이를 기반으로 하는 규칙 엔진(Rule Engine)을 기반으로 한 OWL 추론 엔진은 메타 규칙((Meta Rule)에 의존해 왔다. 메타 규칙은 OWL의 의미론 (Semantics)을 표현하기 용이하여 보다 손쉽게 OWL 추론 엔진을 구현할 수 있다는 장점을 제공하였으나 OWL 추론 성능에 있어 추론 속도와 대용량 온톨로지 처리 측면에서 모두 만족할 만한 성과를 얻지 못하였다. 본 논문은 DLP(Description Logic Programming)의 번역 접근법을 기반으로 한 번역 규칙(Translation Rules)을 메타 규칙과 혼용하는 OWL 추론 기법을 소개한다. LUBM 벤치마크를 통해 이 기법이 메타 규칙만을 이용했을 때 보다 100% 이상 추론 성능을 향상시켰을 뿐 아니라 메모리 사용량도 대폭 축소시켰음을 확인할 수 있었다. 또한, 번역을 통해 제한없는 차수 제약(Cardinality Restriction) 관련 추론을 지원하는 등 보다 넓은 범위의 OWL 추론을 지원할 수 있다.

  • PDF

Development of a G-machine Based Translator for a Lazy Functional Programming Language Miranda (지연함수언어 Miranda의 G-기계 기반 번역기 개발)

  • Lee, Jong-Hui;Choe, Gwan-Deok;Yun, Yeong-U;Gang, Byeong-Uk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.5
    • /
    • pp.733-745
    • /
    • 1995
  • This study is aimed at construction of a translator for a functional programming language. For this goal we define a functional programming language which has lazy semantics and develop a translator for it. The execution model selected is the G-machine-based combinator graph reduction. The translator is composed of 4 phases and translates a source program to a C program. The first phase of the translator translates a source program to a enriched lambda- calculus graph, the second phase transforms a lambda-calculus graph into supercombinators, the third phase translates supercombiantors to a G program and the last phase translates the G program to a C program. The final result of the translator, a C program, is compiled to an executable program by C compiler. The translator is implemented in C using compiler development tools such as TACC and Lex, under the UNIX environments. In this paper we present the design and implementation techniques for developing the translator and show results by executing some test problems.

  • PDF

Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation (기계번역을 이용한 교차언어 문서 범주화의 분류 성능 분석)

  • Lee, Yong-Gu
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.1
    • /
    • pp.313-332
    • /
    • 2009
  • Cross-language text categorization(CLTC) can classify documents automatically using training set from other language. In this study, collections appropriated for CLTC were extracted from KTSET. Classification performance of various CLTC methods were compared by SVM classifier using machine translation. Results showed that the classification performance in the order of poly-lingual training method, training-set translation and test-set translation. However, training-set translation could be regarded as the most useful method among CLTC, because it was efficient for machine translation and easily adapted to general environment. On the other hand, low performance was shown to be due to the feature reduction or features with no subject characteristics, which occurred in the process of machine translation of CLTC.

A Study on the Korean Translation Strategy of 《Mu Yang Ai Hua, 牧羊哀話》 by Period (《목양애화(牧羊哀話)》의 시대별 한국어 번역 전략 연구)

  • Moon, dae-il
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.377-382
    • /
    • 2021
  • 《Mu Yang Ai Hua, 牧羊哀話》 is known as the first Korean-sanctioned novel in the history of modern Chinese literature, and is famous for a novel created by the author himself visiting Korea and being inspired. The translation of 《牧羊哀話》 is constantly being re-translated (4 types). These translations also reflect the characteristics of each period, and the translation strategies used have their own characteristics. The results of the comparative analysis of the four types of translations in this study are as follows. The role A was published during the Japanese colonial period, and some parts were reduced and omitted according to the intent of the translator, and a foreignization translation strategy was used. B, C, and D have implemented content equivalence by utilizing many of the localization translation strategies, and added supplementary explanations in part to help readers understand. Since translation is a process of communication, it should not just convert the source text to the target text, but the target reader's response to the work should be the same as that of the reader. Therefore, translation must be able to understand the environment of the times and the readership, and it must use all possible methods to elicit the same emotion and empathy as the reader has read the original text. Therefore, translators need to use their nationalization and foreignization strategies at the same time based on their understanding of the target language and the politics, economy, history, culture, etc. of the destination country.

Interaction of native language interference and universal language interference on L2 intonation acquisition: Focusing on the pitch range variation (L2 억양에서 나타나는 모국어 간섭과 언어 보편적 간섭현상의 상호작용: 피치대역을 중심으로)

  • Yune, Youngsook
    • Phonetics and Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.35-46
    • /
    • 2021
  • In this study, we examined the interactive aspects between pitch reduction phenomena considered a universal language phenomenon and native language interference in the production of L2 intonation performed by Chinese learners of Korean. To investigate their interaction, we conducted an acoustic analysis using acoustic measures such as pitch span, pitch level, pitch dynamic quotient, skewness, and kurtosis. In addition, the correlation between text comprehension and pitch was examined. The analyzed material consisted of four Korean discourses containing five and seven sentences of varying difficulty. Seven Korean native speakers and thirty Chinese learners who differed in their Korean proficiency participated in the production test. The results, for differences by language, showed that Chinese had a more expanded pitch span, and a higher pitch level than Korean. The analysis between groups showed that at the beginner and intermediate levels, pitch reduction was prominent, i.e., their Korean was characterized by a compressed pitch span, low pitch level, and less sentence internal pitch variation. Contrariwise, the pitch use of advanced speakers was most similar to Korean native speakers. There was no significant correlation between text difficulty and pitch use. Through this study, we observed that pitch reduction was more pronounced than native language interference in the phonetic layer.

Recognition and Evaluation of Efficient Language Analysis Unit for Korean (한국어에서 실용적 언어분석 단위의 인식과 평가)

  • 박인철
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.1
    • /
    • pp.65-76
    • /
    • 2004
  • In this paper, we observe the differences between linguistic and computational aspect in the automatic processing of languages which are dominant representation method for information in the Internet. For efficient information retrieval, information extraction and machine translation from the massive documents, we investigate analysis units for morphology analysis, syntactic analysis and semantic analysis. and propose the syntactic longest analysis unit rather than morphological unit based on linguistics. Also, by evaluating with massive documents, we show that the proposed analysis units can be used for the constraint which can reduce the ambiguity occurring in the language processing.

  • PDF

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

A Study on the Satirical Content Plot of an Absurd Play - Focused on Lee Keun-sam's Play - (부조리극의 풍자적 콘텐츠 플롯 연구 - 이근삼 희곡 <원고지>를 중심으로 -)

  • Son, Dae-Hwan
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.5
    • /
    • pp.73-82
    • /
    • 2019
  • The satirical content of the absurd play, centered on Lee Keun-sam's play, represents the family image of a modern capitalist society where only duty is emphasized while the characters are lost in love with the family. They show humans becoming subordinate to economic logic as traditional relationships and family relationships change into material ones due to the rapid development of the economy. The narrator expresses the roles of the performer and the narrator together. It also presents the plot as a characteristic element of epic and absurd dramas, and directs actors as directors. It also foretells the events that will take place in the future, presents the inner consciousness of the characters in the play, and reduces and expands events and times. In terms of conflict, in order to fulfill the financial responsibility of their children, the professor translates them like a machine and the wife distributes the money they earn as they demand. The middle-aged professor and his wife are not willing to make a difference in the real world, so specific conflicts are not revealed. Therefore, no concrete conflict appears within this work. The plot of consisted of 22 epicentre compartments, consisting of a time frame from evening to the next morning. And no special events happen and show only one family's daily life. In addition, materials that show simple repetition of daily life such as newspapers, rice, birthdays, etc. are effectively showing the character of absurdity through repeated structure. The linguistic features of the absurd play focus on expressing anxiety, despair, fantasy and the sense of loss that the object's purpose has disappeared. The stage system avoids detailed portrayals of naturalist plays and creates a thoroughly simplified image that the theme of the play demands, which shows that the stage unit is also an important element that characterizes the absurdity of reflexes.

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.