• Title/Summary/Keyword: Source language

Search Result 484, Processing Time 0.029 seconds

Analysis of Korean Language by First Order Markov Source (한글의 First Order Markov Source에 의한 해석)

  • 한영렬;박종원
    • Proceedings of the Korean Institute of Communication Sciences Conference
    • /
    • 1982.10a
    • /
    • pp.51-55
    • /
    • 1982
  • The analysis of Korean language by the first order markov source is carried out. The calculated entropy of the first order Markov source is also included. The results presented here are new data. The data can be useful in designing the keyboard pattern of terminal and the automatic discrimination of monosyllable in Korean language.

  • PDF

A Survey of Automatic Code Generation from Natural Language

  • Shin, Jiho;Nam, Jaechang
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.537-555
    • /
    • 2021
  • Many researchers have carried out studies related to programming languages since the beginning of computer science. Besides programming with traditional programming languages (i.e., procedural, object-oriented, functional programming language, etc.), a new paradigm of programming is being carried out. It is programming with natural language. By programming with natural language, we expect that it will free our expressiveness in contrast to programming languages which have strong constraints in syntax. This paper surveys the approaches that generate source code automatically from a natural language description. We also categorize the approaches by their forms of input and output. Finally, we analyze the current trend of approaches and suggest the future direction of this research domain to improve automatic code generation with natural language. From the analysis, we state that researchers should work on customizing language models in the domain of source code and explore better representations of source code such as embedding techniques and pre-trained models which have been proved to work well on natural language processing tasks.

Perspective Coherence in Simultaneous Interpreting - with Reference to German-Korean Interpreting - (동시통역과 시각적 응집성 - 독한 통역을 중심으로 -)

  • Ahn In-Kyoung
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.9
    • /
    • pp.169-193
    • /
    • 2004
  • In simultaneous interpreting, if the syntactic structure of the source language and the target language are very different, interpreters have to wait before being able to reformulate the source text segments into a meaningful utterance in target language. It is inevitable to adapt the target language structure to that of the source language so as not to unduly increase the memory load and to minimize the pause. While such adaptation enables simultaneous interpretating, it results in damaging the perspective coherence of the text. Discovering when such perspective coherence is impaired, and how the problem can be relieved, will enable interpreters to enhance their performance. This paper analyses the reasons for perspective coherence damage by looking at some examples of German-Korean simultaneous interpreting.

  • PDF

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

  • Kim, Ki-Ju;Cho, Young-Bok
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.33-38
    • /
    • 2020
  • Elasticsearch is an open source search and analytics engine that can search petabytes of data in near real time. It is designed as a distributed system horizontally scalable and highly available. It provides RESTful APIs, thereby making it programming-language agnostic. Full text search of multilingual text requires language-specific analyzers and field mappings appropriate for indexing and searching multilingual text. Additionally, a language detector can be used in conjunction with the analyzers to improve the multilingual text search. Elasticsearch provides more than 40 language analysis plugins that can process text and extract language-specific tokens and language detector plugins that can determine the language of the given text. This study investigates three different approaches to index and search Chinese, Japanese, and Korean (CJK) text (single analyzer, multi-fields, and language detector-based), and identifies the advantages of the language detector-based approach compared to the other two.

A Translator of MUSS-80 for CYBER-72l

  • 이용태;이은구
    • Communications of the Korean Institute of Information Scientists and Engineers
    • /
    • v.1 no.1
    • /
    • pp.23-35
    • /
    • 1983
  • In its global meaning language translation refers to the process whereby a program which is executable in one computer can be executed in another computer directly to obtain the same result. There are four different ways of approaching translation. The first way is translation by a Translator or a Compier, the second way is Interpretation, the third way is Simulation, the last way is Emulation. This paper introduces the M-C Translator which was designed as the first way of translation. The MUSS 80 language (the subsystem of the UNIVAC Solid State 80 S-4 assembly language system) was chosen as the source language which includes forty-three instructions, using the CYBER COMPASS as the object language. The M-C translator is a two pass translator and is a two pas translator and es written in Fortran Extended language. For this M-C Translation, seven COMPASS subroutines and a set of thirty-five macros were prepared. Each executable source instruction corresponds to a macro, so it will be a macro instruction within the object profram. Subroutines are used to retain and handle the source data representation the same way in the object program as in the source system, and are used to convert the decimal source data into the equivalent binary result into the equivalent USS-80digits before and after arithmetic operations. The source instructions can be classified into three categories. First, therd are some instructions which are meaningless in the object system and are therefore unnecessary to translate, and the remaining instructions should be translated. Second, There are some instructions are required to indicate dual address portions. Third, there are Three instructions which have overflow conditions, which are lacking in the remaining instructions. The construction and functions of the M-C Translator, are explained including some of the subroutines, and macros. The problems, difficulties and the method of solving them, and easier features on this translation are analysed. The study of how to save memory and time will be continued.

Research on Natural Language Processing Package using Open Source Software (오픈소스 소프트웨어를 활용한 자연어 처리 패키지 제작에 관한 연구)

  • Lee, Jong-Hwa;Lee, Hyun-Kyu
    • The Journal of Information Systems
    • /
    • v.25 no.4
    • /
    • pp.121-139
    • /
    • 2016
  • Purpose In this study, we propose the special purposed R package named ""new_Noun()" to process nonstandard texts appeared in various social networks. As the Big data is getting interested, R - analysis tool and open source software is also getting more attention in many fields. Design/methodology/approach With more than 9,000 R packages, R provides a user-friendly functions of a variety of data mining, social network analysis and simulation functions such as statistical analysis, classification, prediction, clustering and association analysis. Especially, "KoNLP" - natural language processing package for Korean language - has reduced the time and effort of many researchers. However, as the social data increases, the informal expressions of Hangeul (Korean character) such as emoticons, informal terms and symbols make the difficulties increase in natural language processing. Findings In this study, to solve the these difficulties, special algorithms that upgrade existing open source natural language processing package have been researched. By utilizing the "KoNLP" package and analyzing the main functions in noun extracting command, we developed a new integrated noun processing package "new_Noun()" function to extract nouns which improves more than 29.1% compared with existing package.

Extended pivot-based approach for bilingual lexicon extraction

  • Seo, Hyeong-Won;Kwon, Hong-Seok;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.38 no.5
    • /
    • pp.557-565
    • /
    • 2014
  • This paper describes the extended pivot-based approach for bilingual lexicon extraction. The basic features of the approach can be described as follows: First, the approach builds context vectors between a source (or target) language and a pivot language like English, respectively. This is the same as the standard pivot-based approach which is useful for extracting bilingual lexicons between low-resource languages such as Korean-French. Second, unlike the standard pivot-based approach, the approach looks for similar context vectors in a source language. This is helpful to extract translation candidates for polysemous words as well as lets the translations be more confident. Third, the approach extracts translation candidates from target context vectors through the similarity between source and target context vectors. Based on these features, this paper describes the extended pivot-based approach and does various experiments in a language pair, Korean-French (KR-FR). We have observed that the approach is useful for extracting the most proper translation candidate as well as for a low-resource language pair.

Simultaneous neural machine translation with a reinforced attention mechanism

  • Lee, YoHan;Shin, JongHun;Kim, YoungKil
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.775-786
    • /
    • 2021
  • To translate in real time, a simultaneous translation system should determine when to stop reading source tokens and generate target tokens corresponding to a partial source sentence read up to that point. However, conventional attention-based neural machine translation (NMT) models cannot produce translations with adequate latency in online scenarios because they wait until a source sentence is completed to compute alignment between the source and target tokens. To address this issue, we propose a reinforced learning (RL)-based attention mechanism, the reinforced attention mechanism, which allows a neural translation model to jointly train the stopping criterion and a partial translation model. The proposed attention mechanism comprises two modules, one to ensure translation quality and the other to address latency. Different from previous RL-based simultaneous translation systems, which learn the stopping criterion from a fixed NMT model, the modules can be trained jointly with a novel reward function. In our experiments, the proposed model has better translation quality and comparable latency compared to previous models.

A new approach technique on Speech-to-Speech Translation (신호의 복원된 위상 공간을 이용한 오디오 상황 인지)

  • Le, Thanh Hien;Lee, Sung-young;Lee, Young-Koo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.239-240
    • /
    • 2009
  • We live in a flat world in which globalization fosters communication, travel, and trade among more than 150 countries and thousands of languages. To surmount the barriers among these languages, translation is required; Speech-to-Speech translation will automate the process. Thanks to recent advances in Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), one can now utilize a system to translate a speech of source language to a speech of target language and vice versa in affordable manner. The three phase process establishes that the source speech be transcribed into a (set of) text of the source language (ASR) before the source text is translated into the target text (MT). Finally, the target speech is synthesized from the target text (TTS).

Bilingual lexicon induction through a pivot language

  • Kim, Jae-Hoon;Seo, Hyeong-Won;Kwon, Hong-Seok
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.3
    • /
    • pp.300-306
    • /
    • 2013
  • This paper presents a new method for constructing bilingual lexicons through a pivot language. The proposed method is adapted from the context-based approach, called the standard approach, which is well-known for building bilingual lexicons using comparable corpora. The main difference between the standard approach and the proposed method is how to represent context vectors. The former is to represent context vectors in a target language, while the latter in a pivot language. The proposed method is very simplified from the standard approach thereby. Furthermore, the proposed method is more accurate than the standard approach because it uses parallel corpora instead of comparable corpora. The experiments are conducted on a language pair, Korean and Spanish. Our experimental results have shown that the proposed method is quite attractive where a parallel corpus directly between source and target languages are unavailable, but both source-pivot and pivot-target parallel corpora are available.