Search | Korea Science

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

Junseok Oh;Eunsoo Cho;Ji-Hwan Kim
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.18 no.6
- /
- pp.1692-1705
- /
- 2024
In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.
https://doi.org/10.3837/tiis.2024.06.015 인용 PDF HTML

Development of a Korean chatbot system that enables emotional communication with users in real time (사용자와 실시간으로 감성적 소통이 가능한 한국어 챗봇 시스템 개발)

Baek, Sungdae;Lee, Minho
- Journal of Sensor Science and Technology
- /
- v.30 no.6
- /
- pp.429-435
- /
- 2021
In this study, the creation of emotional dialogue was investigated within the process of developing a robot's natural language understanding and emotional dialogue processing. Unlike an English-based dataset, which is the mainstay of natural language processing, the Korean-based dataset has several shortcomings. Therefore, in a situation where the Korean language base is insufficient, the Korean dataset should be dealt with in detail, and in particular, the unique characteristics of the language should be considered. Hence, the first step is to base this study on a specific Korean dataset consisting of conversations on emotional topics. Subsequently, a model was built that learns to extract the continuous dialogue features from a pre-trained language model to generate sentences while maintaining the context of the dialogue. To validate the model, a chatbot system was implemented and meaningful results were obtained by collecting the external subjects and conducting experiments. As a result, the proposed model was influenced by the dataset in which the conversation topic was consultation, to facilitate free and emotional communication with users as if they were consulting with a chatbot. The results were analyzed to identify and explain the advantages and disadvantages of the current model. Finally, as a necessary element to reach the aforementioned ultimate research goal, a discussion is presented on the areas for future studies.
https://doi.org/10.46670/JSST.2021.30.6.429 인용 PDF KSCI

Robustness of Differentiable Neural Computer Using Limited Retention Vector-based Memory Deallocation in Language Model

Lee, Donghyun;Park, Hosung;Seo, Soonshin;Son, Hyunsoo;Kim, Gyujin;Kim, Ji-Hwan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.15 no.3
- /
- pp.837-852
- /
- 2021
Recurrent neural network (RNN) architectures have been used for language modeling (LM) tasks that require learning long-range word or character sequences. However, the RNN architecture is still suffered from unstable gradients on long-range sequences. To address the issue of long-range sequences, an attention mechanism has been used, showing state-of-the-art (SOTA) performance in all LM tasks. A differentiable neural computer (DNC) is a deep learning architecture using an attention mechanism. The DNC architecture is a neural network augmented with a content-addressable external memory. However, in the write operation, some information unrelated to the input word remains in memory. Moreover, DNCs have been found to perform poorly with low numbers of weight parameters. Therefore, we propose a robust memory deallocation method using a limited retention vector. The limited retention vector determines whether the network increases or decreases its usage of information in external memory according to a threshold. We experimentally evaluate the robustness of a DNC implementing the proposed approach according to the size of the controller and external memory on the enwik8 LM task. When we decreased the number of weight parameters by 32.47%, the proposed DNC showed a low bits-per-character (BPC) degradation of 4.30%, demonstrating the effectiveness of our approach in language modeling tasks.
https://doi.org/10.3837/tiis.2021.03.002 인용 PDF KSCI HTML

LFMMI-based acoustic modeling by using external knowledge (External knowledge를 사용한 LFMMI 기반 음향 모델링)

Park, Hosung;Kang, Yoseb;Lim, Minkyu;Lee, Donghyun;Oh, Junseok;Kim, Ji-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.5
- /
- pp.607-613
- /
- 2019
This paper proposes LF-MMI (Lattice Free Maximum Mutual Information)-based acoustic modeling using external knowledge for speech recognition. Note that an external knowledge refers to text data other than training data used in acoustic model. LF-MMI, objective function for optimization of training DNN (Deep Neural Network), has high performances in discriminative training. In LF-MMI, a phoneme probability as prior probability is used for predicting posterior probability of the DNN-based acoustic model. We propose using external knowledges for training the prior probability model to improve acoustic model based on DNN. It is measured to relative improvement 14 % as compared with the conventional LF-MMI-based model.
https://doi.org/10.7776/ASK.2019.38.5.607 인용 PDF KSCI

A study on Korean multi-turn response generation using generative and retrieval model (생성 모델과 검색 모델을 이용한 한국어 멀티턴 응답 생성 연구)

Lee, Hodong;Lee, Jongmin;Seo, Jaehyung;Jang, Yoonna;Lim, Heuiseok
- Journal of the Korea Convergence Society
- /
- v.13 no.1
- /
- pp.13-21
- /
- 2022
Recent deep learning-based research shows excellent performance in most natural language processing (NLP) fields with pre-trained language models. In particular, the auto-encoder-based language model proves its excellent performance and usefulness in various fields of Korean language understanding. However, the decoder-based Korean generative model even suffers from generating simple sentences. Also, there is few detailed research and data for the field of conversation where generative models are most commonly utilized. Therefore, this paper constructs multi-turn dialogue data for a Korean generative model. In addition, we compare and analyze the performance by improving the dialogue ability of the generative model through transfer learning. In addition, we propose a method of supplementing the insufficient dialogue generation ability of the model by extracting recommended response candidates from external knowledge information through a retrival model.
https://doi.org/10.15207/JKCS.2022.13.01.013 인용 PDF KSCI

Opacity and Presupposition Inheritance in Belief Contexts

Kim, Kyoung-Ae
- Language and Information
- /
- v.3 no.2
- /
- pp.67-83
- /
- 1999
This paper attempts to provide an account for the problems of intensional opacity of referring expressions and the presupposition inheritance in the belief contexts from the discourse perspective. I discuss Jaszczolt's discourse model based on DRT to account for the belief reports. Jaszczolt analyzes referring expressions in terms of the three readings(de re, de $dicto_1$ and de $dicto_2$) and attempts to represent the differences between them in the DRS's via different anchoring modes; external anchoring, formal anchoring and nonanchoring. I propose an extended model to account for the presupposition inheritance in the belief contexts and attempt to analyze the data in Korean based on this model. The differences in the PI and in the representations of DRS's which are induced by the different complement types, ${\ldots}ko(mitta)\;and\;{\ldots}kesul(mitta)$, are discussed.
PDF

The Relationship between the Mental Model and the Depictive Gestures Observed in the Explanations of Elementary School Students about the Reason Why Seasons change (계절의 변화 원인에 대한 초등학생들의 설명에서 확인된 정신 모델과 묘사적 몸짓의 관계 분석)

Kim, Na-Young;Yang, Il-Ho;Ko, Min-Seok
- Journal of the Korean Society of Earth Science Education
- /
- v.7 no.3
- /
- pp.358-370
- /
- 2014
The purpose of this study is to analyze the relationship between the mental model and the depictive gestures observed in the explanations of elementary school students about the reason why seasons change. As a result of analysis in gestures of each mental model, mental model was remembered as "motion" in case of CM-type, and showed more "Exphoric" gestures that expressed gesture as a language. CF type is remembered in "writings or pictures," and metaphoric gestures were used when explaining some alternative concepts. CF-UM type explained with language in detail, and showed a number of gestures with "Lexical." Analyzing depictive gestures, even with sub-categories such as rotation, revolution and meridian altitude, etc., a great many types of gestures were expressed such as indicating with fingers, palms, arms, ball-point pens, and fists, etc., or drawing, spinning and indicating them. We could check up concept understandings of the students through this. In addition, as we analyzed inconsistencies among external representations such as verbal language and gesture, writing and gesture, and picture and gesture, we realized that gestures can help understanding mental models of the students, and sometimes, we could know that information that cannot be shown by linguistic explanations or pictures was expressed in gestures. Additionally, we looked into two research participants that showed conspicuous differences. One participant seemed to be wrong as he used his own expressions, but he expressed with gestures precisely, while the other participant seemed to be accurate, but when he analyzed gestures, he had whimsical concepts.
https://doi.org/10.15523/JKSESE.2014.7.3.358 인용 PDF KSCI

An Empirical Study on the Factors Affecting Diffusion of Objeccl-Oriented Technology (객체지향 기술의 확산에 영향을 주는 요인에 관한 경험적 연구)

이민화
- The Journal of Information Systems
- /
- v.10 no.1
- /
- pp.97-126
- /
- 2001
Object-orientation has been proposed as a promising software process innovation to improve software productivity and quality. It has not been understood clearly, however, what factors influences the diffusion of object-oriented technology in organizations. A research model was formulated and hypotheses were generated based on the literature of information technology implementation and software process innovation. To test the research hypotheses, a questionnaire survey was conducted. The results based on 121 responses from Korean companies revealed that project characteristics, use of external experts, and number of development projects are significantly related to the diffusion of object-oriented analysis and design and object-oriented programming. Innovation champion is positively related to the diffusion of object-oriented analysis and design, whereas it is not related to the diffusion of object-oriented programming language. Only project complexity was significantly related to the diffusion of visual programming language. On the other hand, organizational size was not significantly related to any object-oriented technology in this study.
PDF

Visual Dynamics Model for 3D Text Visualization

Lim, Sooyeon
- International Journal of Contents
- /
- v.14 no.4
- /
- pp.86-91
- /
- 2018
Text has evolved along with the history of art as a means of communicating human intentions and emotions. In addition, text visualization artworks have been combined with the social form and contents of new media to produce social messages and related meanings. Recently, in text visualization artworks combined with digital media, communication forms with viewers are changing instantly and interactively, and viewers are actively participating in creating artworks by direct engagement. Interactive text visualization with additional viewer's interaction, generates external dynamics from text shapes and internal dynamics from embedded meanings of text. The purpose of this study is to propose a visual dynamics model to express the dynamics of text and to implement a text visualization system based on the model. It uses the deconstruction of the imaged text to create an interactive text visualization system that reacts to the gestures of the viewer in real time. Visual Transformation synchronized with the intentions of the viewer prevent the text from remaining in the interpretation of language symbols and extend the various meanings of the text. The visualized text in various forms shows visual dynamics that interpret the meaning according to the cultural background of the viewer.
https://doi.org/10.5392/IJoC.2018.14.4.086 인용 PDF KSCI

Design of Extended Database Query language Based on SMIL 2.0 (SMIL 2.0을 기반으로 하는 확장 데이터베이스 질의어 설계)

Lee Jung-hwa;Moon Kyong-hi;Yun Hong-won
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.7
- /
- pp.1555-1560
- /
- 2005
The presentation of query results is usually made with an external tool or a report generator but, because the methods of preparing and storing presentations have not been standardized, there are many difficulties for other applications to use query results. Thus, it is necessary for a multimedia data query language to define presentation in a standardized method. In this paper we designed extented SQL is based on SMIL 2.0, which support the proposed presentation model effectively. Furthermore this study proposed methods of using query results in various multimedia applications.
PDF KSCI

Search Result 35, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)