• Title/Summary/Keyword: 언어 학습 모델

Search Result 845, Processing Time 0.028 seconds

Korean Morphological Analysis Method Based on BERT-Fused Transformer Model (BERT-Fused Transformer 모델에 기반한 한국어 형태소 분석 기법)

  • Lee, Changjae;Ra, Dongyul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.4
    • /
    • pp.169-178
    • /
    • 2022
  • Morphemes are most primitive units in a language that lose their original meaning when segmented into smaller parts. In Korean, a sentence is a sequence of eojeols (words) separated by spaces. Each eojeol comprises one or more morphemes. Korean morphological analysis (KMA) is to divide eojeols in a given Korean sentence into morpheme units. It also includes assigning appropriate part-of-speech(POS) tags to the resulting morphemes. KMA is one of the most important tasks in Korean natural language processing (NLP). Improving the performance of KMA is closely related to increasing performance of Korean NLP tasks. Recent research on KMA has begun to adopt the approach of machine translation (MT) models. MT is to convert a sequence (sentence) of units of one domain into a sequence (sentence) of units of another domain. Neural machine translation (NMT) stands for the approaches of MT that exploit neural network models. From a perspective of MT, KMA is to transform an input sequence of units belonging to the eojeol domain into a sequence of units in the morpheme domain. In this paper, we propose a deep learning model for KMA. The backbone of our model is based on the BERT-fused model which was shown to achieve high performance on NMT. The BERT-fused model utilizes Transformer, a representative model employed by NMT, and BERT which is a language representation model that has enabled a significant advance in NLP. The experimental results show that our model achieves 98.24 F1-Score.

Conformer with lexicon transducer for Korean end-to-end speech recognition (Lexicon transducer를 적용한 conformer 기반 한국어 end-to-end 음성인식)

  • Son, Hyunsoo;Park, Hosung;Kim, Gyujin;Cho, Eunsoo;Kim, Ji-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.530-536
    • /
    • 2021
  • Recently, due to the development of deep learning, end-to-end speech recognition, which directly maps graphemes to speech signals, shows good performance. Especially, among the end-to-end models, conformer shows the best performance. However end-to-end models only focuses on the probability of which grapheme will appear at the time. The decoding process uses a greedy search or beam search. This decoding method is easily affected by the final probability output by the model. In addition, the end-to-end models cannot use external pronunciation and language information due to structual problem. Therefore, in this paper conformer with lexicon transducer is proposed. We compare phoneme-based model with lexicon transducer and grapheme-based model with beam search. Test set is consist of words that do not appear in training data. The grapheme-based conformer with beam search shows 3.8 % of CER. The phoneme-based conformer with lexicon transducer shows 3.4 % of CER.

Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)

  • Park, Sang-Un
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.63-81
    • /
    • 2021
  • The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.

Audience Cognitive Reconstruction of the Extended Meaning of Complex Mechanism Text : For Communication Education using Story Media Expressions (복합기제 텍스트의 확장 의미에 대한 수용자의 인지적 재구성 : 서사적 미디어 표현을 활용한 의사소통 교육을 위해)

  • Lim, Ji-Won
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.7
    • /
    • pp.137-143
    • /
    • 2021
  • This discussion can be said to be a qualitative study on the possibility of linking communication education for college students and literacy education for Korean language-linked educators based on the theory of interpretation of cognitive meaning of media text containing complex mechanisms. The implicit meaning of media content expression used as an interactive communication strategy will be accepted as a multilateral interpretation according to the individual learner's cognitive environment. If so, how is the general media content meaning intended by the content creator being accepted? These doubts are the starting point for discussion. To solve the problem, I leaned on the experimental pragmatic methodology of cognitive aesthetics and applied a model of relevance of cognitive linguistics to connect learners' creative cognitive environment and present content to find a contrast. As a result of the discussion, it was possible to establish a basic framework for learners to express their subjectivity and creative thinking that could connect the cognitive environment and present content themselves. In particular, active and positive learners also revealed direct descriptive expressions to build a new cognitive environment, such as suggesting a third alternative to argue the ability to question produced media texts and the validity of the meaning implied in the text. In the future, since media text containing complex mechanisms is an indirect and persuasive communication behavior that occurs easily through various media in modern society, the universal communication principle of reliable conversation between media text creators and audiences should exist.

Design and Implementation of Agent Systems based on Case Markup Language for e-Leaning (e-Learning을 위한 사례 마크업 언어 기반 에이전트 시스템의 설계 및 구현 :사례 기반 학습자 모델을 중심으로)

  • 한선관;윤정섭;조근식
    • The Journal of Society for e-Business Studies
    • /
    • v.6 no.3
    • /
    • pp.63-80
    • /
    • 2001
  • The construction of the students knowledge in e-Learning systems, namely the student modeling, is a core component used to develop e-Learning systems. However, existing e-Learning systems have many problems to share the knowledge in a heterogeneous student model and a distributed knowledge base. Because the methods of the knowledge representation are different in each e-Learning systems, the accumulated knowledge cannot be used or shared without a great deal of difficulty. In order to share this knowledge, existing systems must reconstruct the knowledge bases. Consequently, we propose a new a Case Markup Language based on XML in order to overcome these problems. A distributed e-Learning systems fan have the advantage of easily sharing and managing the heterogeneous knowledge base proposed by CaseML. Moreover students can generate and share a case knowledge to use the communication protocol of agents. In this paper, we have designed and developed a CaseML by using a knowledge markup language. Furthermore, in order to construct an intelligent e-Learning systems, we have done our research based on the design and development of the intelligent agent system by using CaseML.

  • PDF

Psalm Text Generator Comparison Between English and Korean Using LSTM Blocks in a Recurrent Neural Network (순환 신경망에서 LSTM 블록을 사용한 영어와 한국어의 시편 생성기 비교)

  • Snowberger, Aaron Daniel;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.269-271
    • /
    • 2022
  • In recent years, RNN networks with LSTM blocks have been used extensively in machine learning tasks that process sequential data. These networks have proven to be particularly good at sequential language processing tasks by being more able to accurately predict the next most likely word in a given sequence than traditional neural networks. This study trained an RNN / LSTM neural network on three different translations of 150 biblical Psalms - in both English and Korean. The resulting model is then fed an input word and a length number from which it automatically generates a new Psalm of the desired length based on the patterns it recognized while training. The results of training the network on both English text and Korean text are compared and discussed.

  • PDF

Analysis of Vocabulary Relations by Dimensional Reduction for Word Vectors Visualization (차원감소 단어벡터 시각화를 통한 어휘별 관계 분석)

  • Ko, Kwang-Ho;Paik, Juryon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.13-16
    • /
    • 2022
  • LSTM과 같은 딥러닝 기법을 이용해 언어모델을 얻는 과정에서 일종의 부산물로 학습 대상인 말뭉치를 구성하는 어휘의 단어벡터를 얻을 수 있다. 단어벡터의 차원을 2차원으로 감소시킨 후 이를 평면에 도시하면 대상 문장/문서의 핵심 어휘 사이의 상대적인 거리와 각도 등을 직관적으로 확인할 수 있다. 본 연구에서는 기형도의 시(詩)을 중심으로 특정 작품을 선정한 후 시를 구성하는 핵심 어휘들의 차원 감소된 단어벡터를 2D 평면에 도시하여, 단어벡터를 얻기 위한 텍스트 전처리 방식에 따라 그 거리/각도가 달라지는 양상을 분석해 보았다. 어휘 사이의 거리에 의해 군집/분류의 결과가 달라질 수 있고, 각도에 의해 유사도/유추 연산의 결과가 달라질 수 있으므로, 평면상에서 핵심 어휘들의 상대적인 거리/각도의 직관적 확인을 통해 군집/분류작업과 유사도 추천/유추 등의 작업 결과의 양상 변화를 확인할 수 있었다. 이상의 결과를 통해, 영화 추천/리뷰나 문학작품과 같이 단어 하나하나의 배치에 따라 그 분위기와 정동이 달라지는 분야의 경우 텍스트 전처리에 따른 거리/각도 변화를 미리 직관적으로 확인한다면 분류/유사도 추천과 같은 작업을 좀 더 정밀하게 수행할 수 있을 것으로 판단된다.

  • PDF

Classification of infant cries using 3D feature vectors (3D 특징 벡터를 이용한 영아 울음소리 분류)

  • Park, JeongHyeon;Kim, MinSeo;Choi, HyukSoon;Moon, Nammee
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.597-599
    • /
    • 2022
  • 영아는 울음이라는 비언어적 의사 소통 방식을 사용하여 모든 욕구를 표현한다. 하지만 영아의 울음소리를 파악하는 것에는 어려움이 따른다. 영아의 울음소리를 해석하기 위해 많은 연구가 진행되었다. 이에 본 논문에서는 3D 특징 벡터를 이용한 영아의 울음소리 분류를 제안한다. Donate-a-corpus-cry 데이터 세트는 복통, 트림, 불편, 배고픔, 피곤으로 총 5 개의 클래스로 분류된 데이터를 사용한다. 데이터들은 원래 속도의 90%와 110%로 수정하는 방법인 템포조절을 통해 증강한다. Spectrogram, Mel-Spectrogram, MFCC 로 특징 벡터화를 시켜준 후, 각각의 2 차원 특징벡터를 묶어 3차원 특징벡터로 구성한다. 이후 3 차원 특징 벡터를 ResNet 과 EfficientNet 모델로 학습을 진행한다. 그 결과 2 차원 특징 벡터는 0.89(F1) 3 차원 특징 벡터의 경우 0.98(F1)으로 0.09 의 성능 향상을 보여주었다.

Sequence Labeling-based Multiple Causal Relations Extraction using Pre-trained Language Model for Maritime Accident Prevention (해양사고 예방을 위한 사전학습 언어모델의 순차적 레이블링 기반 복수 인과관계 추출)

  • Ki-Yeong Moon;Do-Hyun Kim;Tae-Hoon Yang;Sang-Duck Lee
    • Journal of the Korean Society of Safety
    • /
    • v.38 no.5
    • /
    • pp.51-57
    • /
    • 2023
  • Numerous studies have been conducted to analyze the causal relationships of maritime accidents using natural language processing techniques. However, when multiple causes and effects are associated with a single accident, the effectiveness of extracting these causal relations diminishes. To address this challenge, we compiled a dataset using verdicts from maritime accident cases in this study, analyzed their causal relations, and applied labeling considering the association information of various causes and effects. In addition, to validate the efficacy of our proposed methodology, we fine-tuned the KoELECTRA Korean language model. The results of our validation process demonstrated the ability of our approach to successfully extract multiple causal relationships from maritime accident cases.

A Study on Korean Poetry Generation System Based on Artificial Intelligence (인공지능 기반 한국어 시 생성 시스템 개발 연구)

  • Myung-sun Kim;Woo-Hyuk Jung;Jihwan Woo
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.43-57
    • /
    • 2023
  • In this study, we developed an AI-based system to generate sentences that assist in creating Korean poetry. Instead of replacing the creative aspect of composition, which is considered a unique domain of humans, the focus was on generating foundational sentences to enhance human imagination efficiently. By conducting interviews with poets, the researchers extracted sentences from eight distinct datasets, enabling the generation of poetry across eight different genres. This study stands out for its innovation in developing a method for crafting literary works in Korean. Its significance lies in its potential to facilitate the creation of diverse literary forms such as essays, prose, or novels.