• Title/Summary/Keyword: Multilingual processing

Search Result 41, Processing Time 0.022 seconds

Preprocessing of the French Sentence for Multilingual Information Processing (다국어 정보처리를 위한 불어 전처리에 관한 연구)

  • Seo, Rae-Won;Park, Se-Won;Yu, Seong-Jun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1132-1140
    • /
    • 2000
  • The purpose of this paper was to present the method of progress efficiency of morphosyntaxical analyzer for French information processing in the view of multilingual information processing. This study indicated that he method prohibit the additional useless word type by decomposing word type by decomposing works which were created by morphological amalgamation. Findings also suggested the need of preprocessing in order to decrease the overload of morphosyntaxical analysis algorithm. In addition, general rules were proposed to divide word form and to decompose complex sentence.

  • PDF

A multilingual grammar model of honorification: using the HPSG and MRS formalism

  • Song, Sanghoun
    • Language and Information
    • /
    • v.20 no.1
    • /
    • pp.25-49
    • /
    • 2016
  • Honorific forms express the speaker's social attitude to others and also indicate the social ranks and level of intimacy of the participants in the discourse. In a cross-linguistic perspective of grammar engineering, modelling honorification has been regarded as a key strategy for improving language processing applications. Using the HPSG and MRS formalism, this article provides a multilingual grammar model of honorification. The present study incorporates the honorific information into the Meaning Representation System (MRS) via Individual Constraints (ICONS), and then conducts an evaluation to see if the model contributes to semantics-based language processing.

  • PDF

Combination of Classifiers Decisions for Multilingual Speaker Identification

  • Nagaraja, B.G.;Jayanna, H.S.
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.928-940
    • /
    • 2017
  • State-of-the-art speaker recognition systems may work better for the English language. However, if the same system is used for recognizing those who speak different languages, the systems may yield a poor performance. In this work, the decisions of a Gaussian mixture model-universal background model (GMM-UBM) and a learning vector quantization (LVQ) are combined to improve the recognition performance of a multilingual speaker identification system. The difference between these classifiers is in their modeling techniques. The former one is based on probabilistic approach and the latter one is based on the fine-tuning of neurons. Since the approaches are different, each modeling technique identifies different sets of speakers for the same database set. Therefore, the decisions of the classifiers may be used to improve the performance. In this study, multitaper mel-frequency cepstral coefficients (MFCCs) are used as the features and the monolingual and cross-lingual speaker identification studies are conducted using NIST-2003 and our own database. The experimental results show that the combined system improves the performance by nearly 10% compared with that of the individual classifier.

Research on Recent Quality Estimation (최신 기계번역 품질 예측 연구)

  • Eo, Sugyeong;Park, Chanjun;Moon, Hyeonseok;Seo, Jaehyung;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.37-44
    • /
    • 2021
  • Quality estimation (QE) can evaluate the quality of machine translation output even for those who do not know the target language, and its high utilization highlights the need for QE. QE shared task is held every year at Conference on Machine Translation (WMT), and recently, researches applying Pretrained Language Model (PLM) are mainly being conducted. In this paper, we conduct a survey on the QE task and research trends, and we summarize the features of PLM. In addition, we used a multilingual BART model that has not yet been utilized and performed comparative analysis with the existing studies such as XLM, multilingual BERT, and XLM-RoBERTa. As a result of the experiment, we confirmed which PLM was most effective when applied to QE, and saw the possibility of applying the multilingual BART model to the QE task.

Syntactic Structured Framework for Resolving Reflexive Anaphora in Urdu Discourse Using Multilingual NLP

  • Nasir, Jamal A.;Din, Zia Ud.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1409-1425
    • /
    • 2021
  • In wide-ranging information society, fast and easy access to information in language of one's choice is indispensable, which may be provided by using various multilingual Natural Language Processing (NLP) applications. Natural language text contains references among different language elements, called anaphoric links. Resolving anaphoric links is a key problem in NLP. Anaphora resolution is an essential part of NLP applications. Anaphoric links need to be properly interpreted for clear understanding of natural languages. For this purpose, a mechanism is desirable for the identification and resolution of these naturally occurring anaphoric links. In this paper, a framework based on Hobbs syntactic approach and a system developed by Lappin & Leass is proposed for resolution of reflexive anaphoric links, present in Urdu text documents. Generally, anaphora resolution process takes three main steps: identification of the anaphor, location of the candidate antecedent(s) and selection of the appropriate antecedent. The proposed framework is based on exploring the syntactic structure of reflexive anaphors to find out various features for constructing heuristic rules to develop an algorithm for resolving these anaphoric references. System takes Urdu text containing reflexive anaphors as input, and outputs Urdu text with resolved reflexive anaphoric links. Despite having scarcity of Urdu resources, our results are encouraging. The proposed framework can be utilized in multilingual NLP (m-NLP) applications.

A study on the aspect-based sentiment analysis of multilingual customer reviews (다국어 사용자 후기에 대한 속성기반 감성분석 연구)

  • Sungyoung Ji;Siyoon Lee;Daewoo Choi;Kee-Hoon Kang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.515-528
    • /
    • 2023
  • With the growth of the e-commerce market, consumers increasingly rely on user reviews to make purchasing decisions. Consequently, researchers are actively conducting studies to effectively analyze these reviews. Among the various methods of sentiment analysis, the aspect-based sentiment analysis approach, which examines user reviews from multiple angles rather than solely relying on simple positive or negative sentiments, is gaining widespread attention. Among the various methodologies for aspect-based sentiment analysis, there is an analysis method using a transformer-based model, which is the latest natural language processing technology. In this paper, we conduct an aspect-based sentiment analysis on multilingual user reviews using two real datasets from the latest natural language processing technology model. Specifically, we use restaurant data from the SemEval 2016 public dataset and multilingual user review data from the cosmetic domain. We compare the performance of transformer-based models for aspect-based sentiment analysis and apply various methodologies to improve their performance. Models using multilingual data are expected to be highly useful in that they can analyze multiple languages in one model without building separate models for each language.

Multilingual Story Link Detection based on Properties of Event Terms (사건 어휘의 특성을 반영한 다국어 사건 연결 탐색)

  • Lee Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.81-90
    • /
    • 2005
  • In this paper, we propose a novel approach which models multilingual story link detection by adapting the features such as timelines and multilingual spaces as weighting components to give distinctive weights to terms related to events. On timelines term significance is calculated by comparing term distribution of the documents on that day with that on the total document collection reported, and used to represent the document vectors on that day. Since two languages can provide more information than one language, term significance is measured on each language space and used to refer the other language space as a bridge on multilingual spaces. Evaluating the method on Korean and Japanese news articles, our method achieved $14.3{\%}\;and\;16.7{\%}$ improvement for mono- and multi-lingual story pairs, and for multilingual story pairs, respectively. By measuring the space density, the proposed weighting components are verified with a high density of the intra-event stories and a low density of the inter-events stories. This result indicates that the proposed method is helpful for multilingual story link detection.

A Study on a Multilingual name Retrieval (다중 언어 인명 검색에 관한 연구)

  • Cho, Young-Hwa;Song, Jae-Yong;Ryu, Keun-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.9
    • /
    • pp.2271-2280
    • /
    • 1998
  • In this paper, we propose a method to retneve english written korcan names efficientl, and design a multilingual name retrieval system, It is very difficult to retrieve english-written korean names in typical IR sytems. For example, "홍길동" is written in english as vanous forms such like "Hong, gildong", "Gildong Hong", "Hong kil dong", "Hong kil dong" and so on, We not only propose a rule-based querv expansion method to retrieve english-written korean names efficiently but also design a multiligual name retneval system which is consisted of query classifier, exception handler, query expander, query executor, exception list and rulebase, Finally we will try to show that english-written korean names could be efficiently retrieved with rule based name generator.

  • PDF

Multilingual Web-Page Providing system (언어별 웹 화면 제공 시스템)

  • Jin, Sung-Geun;Lee, Jae-Kyung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10b
    • /
    • pp.1545-1548
    • /
    • 2000
  • 본 고는 인터넷 이용자가 원하는 웹페이지 링크시 발생할 수 있는 여러 가지 문제 중, 언어적인 장벽을 해결하기 위한 시스템을 설계하고 이를 이용하여 언어적인 문제를 해결할 수 있는 방법을 제시한다. 인터넷 이용자의 DNS 서버와 로컬 데이터베이스를 이용하여 인터넷 이용자의 언어 정보를 추정하고, 추정한 언어로 구성된 웹페이지를 인터넷 이용자에게 제공하는 시스템을 설계한 것이다. 이로 인해, 인터넷 이용자들에게 발생하는 언어적인 장벽 문제 해결, 웹페이지 상에서의 불필요한 클릭 수 감소, 웹페이지 링크 수 증가, 시간 절약 등의 여러 가지 편의를 제공할 수 있다.

  • PDF

Construction of Korean Wordnet "KorLex 1.5" (한국어 어휘의미망 "KorLex 1.5"의 구축)

  • Yoon, Ae-Sun;Hwang, Soon-Hee;Lee, Eun-Ryoung;Kwon, Hyuk-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.1
    • /
    • pp.92-108
    • /
    • 2009
  • The Princeton WordNet (PWN), which was developed during last 20 years since the mid 80, aimed at representing a mental lexicon inside the human mind. Its potentiality, applicability and portability were more appreciated in the fields of NLP and KE than in cognitive psychology. The semantic and knowledge processing is indispensable in order to obtain useful information using human languages, in the CMC and HCI environment. The PWN is able to provide such NLP-based systems with 'concrete' semantic units and their network. Referenced to the PWN, about 50 wordnets of different languages were developed during last 10 years and they enable a variety of multilingual processing applications. This paper aims at describing PWN-referenced Korean Wordnet, KorLex 1.5, which was developed from 2004 to 2007, and which contains currently about 130,000 synsets and 150,000 word senses for nouns, verbs, adjectives, adverbs, and classifiers.