• 제목/요약/키워드: Processing Language

검색결과 2,692건 처리시간 0.03초

Document Ranking Method for High Precision Rate

  • Jeon, Mee-Sun
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 1995년도 Language, Information and Computation = Proceedings of the 10th Pacific Asia Conference, Hong Kong
    • /
    • pp.255-260
    • /
    • 1995
  • PDF

Contextual Language Processing Model

  • Jitsuko, Igarashi
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 1985년도 Proceedings of 84 Matsuyama Workshop on Formal Grammar
    • /
    • pp.23-29
    • /
    • 1985
  • PDF

한국인의 영어처리의 기제 : 모국어처리와의 상호작용을 중심으로 (The Processing System of English for Korean : Focused on the Interaction with Native Language Processing)

  • 이창환;강봉경
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2004년도 제16회 한글.언어.인지 한술대회
    • /
    • pp.240-247
    • /
    • 2004
  • 영어를 제2언어로 사용하는 한국인을 대상으로 이중언어의 어휘접근이 음운 정보와 관련하여 어떻게 일어나는 지를 알아보았다. 이중언어를 처리할 때에 양 언어의 음운적 지식이 동시에 활성화된다는 비선택적 가설과 한 언어의 음운적 지식만이 활성화된다는 선택적 가설을 검증하고자하는 목적으로 2개의 실험을 실시하였다. 실험결과 한글 표적자극의 수행이(실험2) 점화자극으로 제시된 영어 단어의 음운적 조작에 따라 유의미한 영향을 받았고, 영어 표적자극을 처리할 때에는(실험1) 점화자극으로 제시된 한글 단어의 음운적 조작에 따라 영향을 받는 경향을 발견하였다. 이는 이중언어 중 한 언어를 처리할 때에 다른 언어의 음운적지식이 자동적으로 활성화됨을 의미하며 한국인에게 있어서 제2언어인 영어의 처리에서 음운정보가 중요한 역할을 함을 의미한다.

  • PDF

A Simple Syntax for Complex Semantics

  • Lee, Kiyong
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2002년도 Language, Information, and Computation Proceedings of The 16th Pacific Asia Conference
    • /
    • pp.2-27
    • /
    • 2002
  • As pact of a long-ranged project that aims at establishing database-theoretic semantics as a model of computational semantics, this presentation focuses on the development of a syntactic component for processing strings of words or sentences to construct semantic data structures. For design arid modeling purposes, the present treatment will be restricted to the analysis of some problematic constructions of Korean involving semi-free word order, conjunction arid temporal anchoring, and adnominal modification and antecedent binding. The present work heavily relies on Hausser's (1999, 2000) SLIM theory for language that is based on surface compositionality, time-linearity arid two other conditions on natural language processing. Time-linear syntax for natural language has been shown to be conceptually simple and computationally efficient. The associated semantics is complex, however, because it must deal with situated language involving interactive multi-agents. Nevertheless, by processing input word strings in a time-linear mode, the syntax cart incrementally construct the necessary semantic structures for relevant queries and valid inferences. The fragment of Korean syntax will be implemented in Malaga, a C-type implementation language that was enriched for both programming and debugging purposes arid that was particluarly made suitable for implementing in Left-Associative Grammar. This presentation will show how the system of syntactic rules with constraining subrules processes Korean sentences in a step-by-step time-linear manner to incrementally construct semantic data structures that mainly specify relations with their argument, temporal, and binding structures.

  • PDF

사용자와 실시간으로 감성적 소통이 가능한 한국어 챗봇 시스템 개발 (Development of a Korean chatbot system that enables emotional communication with users in real time)

  • 백성대;이민호
    • 센서학회지
    • /
    • 제30권6호
    • /
    • pp.429-435
    • /
    • 2021
  • In this study, the creation of emotional dialogue was investigated within the process of developing a robot's natural language understanding and emotional dialogue processing. Unlike an English-based dataset, which is the mainstay of natural language processing, the Korean-based dataset has several shortcomings. Therefore, in a situation where the Korean language base is insufficient, the Korean dataset should be dealt with in detail, and in particular, the unique characteristics of the language should be considered. Hence, the first step is to base this study on a specific Korean dataset consisting of conversations on emotional topics. Subsequently, a model was built that learns to extract the continuous dialogue features from a pre-trained language model to generate sentences while maintaining the context of the dialogue. To validate the model, a chatbot system was implemented and meaningful results were obtained by collecting the external subjects and conducting experiments. As a result, the proposed model was influenced by the dataset in which the conversation topic was consultation, to facilitate free and emotional communication with users as if they were consulting with a chatbot. The results were analyzed to identify and explain the advantages and disadvantages of the current model. Finally, as a necessary element to reach the aforementioned ultimate research goal, a discussion is presented on the areas for future studies.

Burmese Sentiment Analysis Based on Transfer Learning

  • Mao, Cunli;Man, Zhibo;Yu, Zhengtao;Wu, Xia;Liang, Haoyuan
    • Journal of Information Processing Systems
    • /
    • 제18권4호
    • /
    • pp.535-548
    • /
    • 2022
  • Using a rich resource language to classify sentiments in a language with few resources is a popular subject of research in natural language processing. Burmese is a low-resource language. In light of the scarcity of labeled training data for sentiment classification in Burmese, in this study, we propose a method of transfer learning for sentiment analysis of a language that uses the feature transfer technique on sentiments in English. This method generates a cross-language word-embedding representation of Burmese vocabulary to map Burmese text to the semantic space of English text. A model to classify sentiments in English is then pre-trained using a convolutional neural network and an attention mechanism, where the network shares the model for sentiment analysis of English. The parameters of the network layer are used to learn the cross-language features of the sentiments, which are then transferred to the model to classify sentiments in Burmese. Finally, the model was tuned using the labeled Burmese data. The results of the experiments show that the proposed method can significantly improve the classification of sentiments in Burmese compared to a model trained using only a Burmese corpus.

시공간 질의 처리 시스템의 설계 및 구현 (Design and Implementation of Spatiotemporal Query Processing Systems)

  • 이성종;김동호;류근호
    • 한국정보처리학회논문지
    • /
    • 제6권5호
    • /
    • pp.1166-1176
    • /
    • 1999
  • The spationtemporal databases support a historical informations as well as spatial managements for various kinds of objects in the real world, and can be efficiently used in many applications such as geographic information system, urban plan system, car navigation system. However it is difficult to represent efficiently historical operations with conventional database query language for spatial objects. In terms of cost for query processing, it also degenerates performance of query processing because of syntactic limitations which is innate in conventional query representation. So in this paper, we introduce a new query language, entitled as STQL, which has been extended on the basis of the most popular relational database query language SQL. And we implement as well as evaluate a spationtemporal query processing system that get a query written by STQL and then process it in a main memory.

  • PDF

Subword Neural Language Generation with Unlikelihood Training

  • Iqbal, Salahuddin Muhammad;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제12권2호
    • /
    • pp.45-50
    • /
    • 2020
  • A Language model with neural networks commonly trained with likelihood loss. Such that the model can learn the sequence of human text. State-of-the-art results achieved in various language generation tasks, e.g., text summarization, dialogue response generation, and text generation, by utilizing the language model's next token output probabilities. Monotonous and boring outputs are a well-known problem of this model, yet only a few solutions proposed to address this problem. Several decoding techniques proposed to suppress repetitive tokens. Unlikelihood training approached this problem by penalizing candidate tokens probabilities if the tokens already seen in previous steps. While the method successfully showed a less repetitive generated token, the method has a large memory consumption because of the training need a big vocabulary size. We effectively reduced memory footprint by encoding words as sequences of subword units. Finally, we report competitive results with token level unlikelihood training in several automatic evaluations compared to the previous work.

언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석 (Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling)

  • 정의정;이영직
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2002년도 하계학술발표대회 논문집 제21권 1호
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF

보건의료 빅데이터에서의 자연어처리기법 적용방안 연구: 단어임베딩 방법을 중심으로 (A Study on the Application of Natural Language Processing in Health Care Big Data: Focusing on Word Embedding Methods)

  • 김한상;정여진
    • 보건행정학회지
    • /
    • 제30권1호
    • /
    • pp.15-25
    • /
    • 2020
  • While healthcare data sets include extensive information about patients, many researchers have limitations in analyzing them due to their intrinsic characteristics such as heterogeneity, longitudinal irregularity, and noise. In particular, since the majority of medical history information is recorded in text codes, the use of such information has been limited due to the high dimensionality of explanatory variables. To address this problem, recent studies applied word embedding techniques, originally developed for natural language processing, and derived positive results in terms of dimensional reduction and accuracy of the prediction model. This paper reviews the deep learning-based natural language processing techniques (word embedding) and summarizes research cases that have used those techniques in the health care field. Then we finally propose a research framework for applying deep learning-based natural language process in the analysis of domestic health insurance data.