• 제목/요약/키워드: External Language Model

검색결과 35건 처리시간 0.021초

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

  • Junseok Oh;Eunsoo Cho;Ji-Hwan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권6호
    • /
    • pp.1692-1705
    • /
    • 2024
  • In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

사용자와 실시간으로 감성적 소통이 가능한 한국어 챗봇 시스템 개발 (Development of a Korean chatbot system that enables emotional communication with users in real time)

  • 백성대;이민호
    • 센서학회지
    • /
    • 제30권6호
    • /
    • pp.429-435
    • /
    • 2021
  • In this study, the creation of emotional dialogue was investigated within the process of developing a robot's natural language understanding and emotional dialogue processing. Unlike an English-based dataset, which is the mainstay of natural language processing, the Korean-based dataset has several shortcomings. Therefore, in a situation where the Korean language base is insufficient, the Korean dataset should be dealt with in detail, and in particular, the unique characteristics of the language should be considered. Hence, the first step is to base this study on a specific Korean dataset consisting of conversations on emotional topics. Subsequently, a model was built that learns to extract the continuous dialogue features from a pre-trained language model to generate sentences while maintaining the context of the dialogue. To validate the model, a chatbot system was implemented and meaningful results were obtained by collecting the external subjects and conducting experiments. As a result, the proposed model was influenced by the dataset in which the conversation topic was consultation, to facilitate free and emotional communication with users as if they were consulting with a chatbot. The results were analyzed to identify and explain the advantages and disadvantages of the current model. Finally, as a necessary element to reach the aforementioned ultimate research goal, a discussion is presented on the areas for future studies.

Robustness of Differentiable Neural Computer Using Limited Retention Vector-based Memory Deallocation in Language Model

  • Lee, Donghyun;Park, Hosung;Seo, Soonshin;Son, Hyunsoo;Kim, Gyujin;Kim, Ji-Hwan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권3호
    • /
    • pp.837-852
    • /
    • 2021
  • Recurrent neural network (RNN) architectures have been used for language modeling (LM) tasks that require learning long-range word or character sequences. However, the RNN architecture is still suffered from unstable gradients on long-range sequences. To address the issue of long-range sequences, an attention mechanism has been used, showing state-of-the-art (SOTA) performance in all LM tasks. A differentiable neural computer (DNC) is a deep learning architecture using an attention mechanism. The DNC architecture is a neural network augmented with a content-addressable external memory. However, in the write operation, some information unrelated to the input word remains in memory. Moreover, DNCs have been found to perform poorly with low numbers of weight parameters. Therefore, we propose a robust memory deallocation method using a limited retention vector. The limited retention vector determines whether the network increases or decreases its usage of information in external memory according to a threshold. We experimentally evaluate the robustness of a DNC implementing the proposed approach according to the size of the controller and external memory on the enwik8 LM task. When we decreased the number of weight parameters by 32.47%, the proposed DNC showed a low bits-per-character (BPC) degradation of 4.30%, demonstrating the effectiveness of our approach in language modeling tasks.

External knowledge를 사용한 LFMMI 기반 음향 모델링 (LFMMI-based acoustic modeling by using external knowledge)

  • 박호성;강요셉;임민규;이동현;오준석;김지환
    • 한국음향학회지
    • /
    • 제38권5호
    • /
    • pp.607-613
    • /
    • 2019
  • 본 논문은 external knowledge를 사용한 lattice 없는 상호 정보 최대화(Lattice Free Maximum Mutual Information, LF-MMI) 기반 음향 모델링 방법을 제안한다. External knowledge란 음향 모델에서 사용하는 학습 데이터 이외의 문자열 데이터를 말한다. LF-MMI란 심층 신경망(Deep Neural Network, DNN) 학습의 최적화를 위한 목적 함수의 일종으로, 구별 학습에서 높은 성능을 보인다. LF-MMI에는 DNN의 사후 확률을 계산하기 위해 음소의 열을 사전 확률로 갖는다. 본 논문에서는 LF-MMI의 목적식의 사전 확률을 담당하는 음소 모델링에 external knowlege를 사용함으로써 과적합의 가능성을 낮추고, 음향 모델의 성능을 높이는 방법을 제안한다. External memory를 사용하여 사전 확률을 생성한 LF-MMI 모델을 사용했을 때 기존 LF-MMI와 비교하여 14 %의 상대적 성능 개선을 보였다.

생성 모델과 검색 모델을 이용한 한국어 멀티턴 응답 생성 연구 (A study on Korean multi-turn response generation using generative and retrieval model)

  • 이호동;이종민;서재형;장윤나;임희석
    • 한국융합학회논문지
    • /
    • 제13권1호
    • /
    • pp.13-21
    • /
    • 2022
  • 최근 딥러닝 기반의 자연어처리 연구는 사전 훈련된 언어 모델을 통해 대부분의 자연어처리 분야에서 우수한 성능을 보인다. 특히 오토인코더 (auto-encoder) 기반의 언어 모델은 다양한 한국어 이해 분야에서 뛰어난 성능과 쓰임을 증명하고 있다. 그러나 여전히 디코더 (decoder) 기반의 한국어 생성 모델은 간단한 문장 생성 과제에도 어려움을 겪고 있으며, 생성 모델이 가장 일반적으로 쓰이는 대화 분야에서의 세부 연구와 학습 가능한 데이터가 부족한 상황이다. 따라서 본 논문은 한국어 생성 모델을 위한 멀티턴 대화 데이터를 구축하고 전이 학습을 통해 생성 모델의 대화 능력을 개선하여 성능을 비교 분석한다. 또한, 검색 모델을 통해 외부 지식 정보에서 추천 응답 후보군을 추출하여 모델의 부족한 대화 생성 능력을 보완하는 방법을 제안한다.

Opacity and Presupposition Inheritance in Belief Contexts

  • Kim, Kyoung-Ae
    • 한국언어정보학회지:언어와정보
    • /
    • 제3권2호
    • /
    • pp.67-83
    • /
    • 1999
  • This paper attempts to provide an account for the problems of intensional opacity of referring expressions and the presupposition inheritance in the belief contexts from the discourse perspective. I discuss Jaszczolt's discourse model based on DRT to account for the belief reports. Jaszczolt analyzes referring expressions in terms of the three readings(de re, de $dicto_1$ and de $dicto_2$) and attempts to represent the differences between them in the DRS's via different anchoring modes; external anchoring, formal anchoring and nonanchoring. I propose an extended model to account for the presupposition inheritance in the belief contexts and attempt to analyze the data in Korean based on this model. The differences in the PI and in the representations of DRS's which are induced by the different complement types, ${\ldots}ko(mitta)\;and\;{\ldots}kesul(mitta)$, are discussed.

  • PDF

계절의 변화 원인에 대한 초등학생들의 설명에서 확인된 정신 모델과 묘사적 몸짓의 관계 분석 (The Relationship between the Mental Model and the Depictive Gestures Observed in the Explanations of Elementary School Students about the Reason Why Seasons change)

  • 김나영;양일호;고민석
    • 대한지구과학교육학회지
    • /
    • 제7권3호
    • /
    • pp.358-370
    • /
    • 2014
  • The purpose of this study is to analyze the relationship between the mental model and the depictive gestures observed in the explanations of elementary school students about the reason why seasons change. As a result of analysis in gestures of each mental model, mental model was remembered as "motion" in case of CM-type, and showed more "Exphoric" gestures that expressed gesture as a language. CF type is remembered in "writings or pictures," and metaphoric gestures were used when explaining some alternative concepts. CF-UM type explained with language in detail, and showed a number of gestures with "Lexical." Analyzing depictive gestures, even with sub-categories such as rotation, revolution and meridian altitude, etc., a great many types of gestures were expressed such as indicating with fingers, palms, arms, ball-point pens, and fists, etc., or drawing, spinning and indicating them. We could check up concept understandings of the students through this. In addition, as we analyzed inconsistencies among external representations such as verbal language and gesture, writing and gesture, and picture and gesture, we realized that gestures can help understanding mental models of the students, and sometimes, we could know that information that cannot be shown by linguistic explanations or pictures was expressed in gestures. Additionally, we looked into two research participants that showed conspicuous differences. One participant seemed to be wrong as he used his own expressions, but he expressed with gestures precisely, while the other participant seemed to be accurate, but when he analyzed gestures, he had whimsical concepts.

객체지향 기술의 확산에 영향을 주는 요인에 관한 경험적 연구 (An Empirical Study on the Factors Affecting Diffusion of Objeccl-Oriented Technology)

  • 이민화
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제10권1호
    • /
    • pp.97-126
    • /
    • 2001
  • Object-orientation has been proposed as a promising software process innovation to improve software productivity and quality. It has not been understood clearly, however, what factors influences the diffusion of object-oriented technology in organizations. A research model was formulated and hypotheses were generated based on the literature of information technology implementation and software process innovation. To test the research hypotheses, a questionnaire survey was conducted. The results based on 121 responses from Korean companies revealed that project characteristics, use of external experts, and number of development projects are significantly related to the diffusion of object-oriented analysis and design and object-oriented programming. Innovation champion is positively related to the diffusion of object-oriented analysis and design, whereas it is not related to the diffusion of object-oriented programming language. Only project complexity was significantly related to the diffusion of visual programming language. On the other hand, organizational size was not significantly related to any object-oriented technology in this study.

  • PDF

Visual Dynamics Model for 3D Text Visualization

  • Lim, Sooyeon
    • International Journal of Contents
    • /
    • 제14권4호
    • /
    • pp.86-91
    • /
    • 2018
  • Text has evolved along with the history of art as a means of communicating human intentions and emotions. In addition, text visualization artworks have been combined with the social form and contents of new media to produce social messages and related meanings. Recently, in text visualization artworks combined with digital media, communication forms with viewers are changing instantly and interactively, and viewers are actively participating in creating artworks by direct engagement. Interactive text visualization with additional viewer's interaction, generates external dynamics from text shapes and internal dynamics from embedded meanings of text. The purpose of this study is to propose a visual dynamics model to express the dynamics of text and to implement a text visualization system based on the model. It uses the deconstruction of the imaged text to create an interactive text visualization system that reacts to the gestures of the viewer in real time. Visual Transformation synchronized with the intentions of the viewer prevent the text from remaining in the interpretation of language symbols and extend the various meanings of the text. The visualized text in various forms shows visual dynamics that interpret the meaning according to the cultural background of the viewer.

SMIL 2.0을 기반으로 하는 확장 데이터베이스 질의어 설계 (Design of Extended Database Query language Based on SMIL 2.0)

  • 이중화;문경희;윤홍원
    • 한국정보통신학회논문지
    • /
    • 제9권7호
    • /
    • pp.1555-1560
    • /
    • 2005
  • 지금까지 질의 결과에 대한 프리젠테이션은 일반적으로 외부 툴이나 리포트 작성기를 통해 이루어지고 있는데, 프리젠테이션을 작성하는 방법이나 저장 방법 등이 표준화되어 있지 않기 때문에 다른 응용에서 질의 결과를 사용하는 데 많은 어려움이 따른다. 따라서 멀티미디어 데이터를 질의하는 질의어에서 표준화된 방법으로 프리젠테이션을 정의할 수 있는 방법이 필요하다. 본 논문에서는 W3C (World Wide Web Consortium) 의 멀티미디어 프리젠테이션 표준인 SMIL (Synchronized Multimedia Integration Language) 2.0을 기반으로 SQL을 확장하여 멀티미디어 데이터에 대한 사용자가 질의와 질의결과에 대한 프리젠테이션 작성를 원활히 할 수 있도록 한다.