• 제목/요약/키워드: Language Model

검색결과 2,692건 처리시간 0.028초

딥러닝 사전학습 언어모델 기술 동향 (Recent R&D Trends for Pretrained Language Model)

  • 임준호;김현기;김영길
    • 전자통신동향분석
    • /
    • 제35권3호
    • /
    • pp.9-19
    • /
    • 2020
  • Recently, a technique for applying a deep learning language model pretrained from a large corpus to fine-tuning for each application task has been widely used as a language processing technology. The pretrained language model shows higher performance and satisfactory generalization performance than existing methods. This paper introduces the major research trends related to deep learning pretrained language models in the field of language processing. We describe in detail the motivations, models, learning methods, and results of the BERT language model that had significant influence on subsequent studies. Subsequently, we introduce the results of language model studies after BERT, focusing on SpanBERT, RoBERTa, ALBERT, BART, and ELECTRA. Finally, we introduce the KorBERT pretrained language model, which shows satisfactory performance in Korean language. In addition, we introduce techniques on how to apply the pretrained language model to Korean (agglutinative) language, which consists of a combination of content and functional morphemes, unlike English (refractive) language whose endings change depending on the application.

정보검색 기법과 동적 보간 계수를 이용한 N-gram 언어모델의 적응 (N- gram Adaptation Using Information Retrieval and Dynamic Interpolation Coefficient)

  • 최준기;오영환
    • 대한음성학회지:말소리
    • /
    • 제56호
    • /
    • pp.207-223
    • /
    • 2005
  • The goal of language model adaptation is to improve the background language model with a relatively small adaptation corpus. This study presents a language model adaptation technique where additional text data for the adaptation do not exist. We propose the information retrieval (IR) technique with N-gram language modeling to collect the adaptation corpus from baseline text data. We also propose to use a dynamic language model interpolation coefficient to combine the background language model and the adapted language model. The interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech data reserved for held-out validation data. This allows the final adapted model to improve the performance of the background model consistently The proposed approach reduces the word error rate by $13.6\%$ relative to baseline 4-gram for two-hour broadcast news speech recognition.

  • PDF

Dependency Structure Applied to Language Modeling for Information Retrieval

  • Lee, Chang-Ki;Lee, Gary Geun-Bae;Jang, Myung-Gil
    • ETRI Journal
    • /
    • 제28권3호
    • /
    • pp.337-346
    • /
    • 2006
  • In this paper, we propose a new language model, namely, a dependency structure language model, for information retrieval to compensate for the weaknesses of unigram and bigram language models. The dependency structure language model is based on the first-order dependency model and the dependency parse tree generated by a linguistic parser. So, long-distance dependencies can be naturally captured by the dependency structure language model. We carried out extensive experiments to verify the proposed model, where the dependency structure model gives a better performance than recently proposed language models and the Okapi BM25 method, and the dependency structure is more effective than unigram and bigram in language modeling for information retrieval.

  • PDF

Examining Generalizability of Kang's (1999) Model of Structural Relationships between ESL Learning Strategy Use and Language Proficiency

  • 강성우
    • 영어어문교육
    • /
    • 제7권2호
    • /
    • pp.55-75
    • /
    • 2002
  • The present study examined whether Kang's (1999) model of the relationships among language learning strategy use and language proficiency for the Asian students could be applied to a more heterogeneous group. In Kang's study, he collected information of language learning strategies of 957 foreign students learning English as a second language in American colleges through a questionnaire. He also measured the subjects' language proficiency with the Institutional Testing Program TOEFL (Test of English as a Foreign Language). This study analyzed the same data without the limitation of cultural identity. Structural equation modeling was used to model the relationships among strategy use and language proficiency. Then, the model of the present study was descriptively compared with Kang's (1999) model for the Asian students. The overall flow of the relationship paths appeared to vary very little across the two models, which would have indicated that the generalizability of Kang's (1999) model could be extended more than originally examined. (156)

  • PDF

문맥의존 철자오류 후보 생성을 위한 통계적 언어모형 개선 (Improved Statistical Language Model for Context-sensitive Spelling Error Candidates)

  • 이정훈;김민호;권혁철
    • 한국멀티미디어학회논문지
    • /
    • 제20권2호
    • /
    • pp.371-381
    • /
    • 2017
  • The performance of the statistical context-sensitive spelling error correction depends on the quality and quantity of the data for statistical language model. In general, the size and quality of data in a statistical language model are proportional. However, as the amount of data increases, the processing speed becomes slower and storage space also takes up a lot. We suggest the improved statistical language model to solve this problem. And we propose an effective spelling error candidate generation method based on a new statistical language model. The proposed statistical model and the correction method based on it improve the performance of the spelling error correction and processing speed.

프로그래밍 언어 교육을 위한 교수·학습 모델 설계 (Design of Teaching·Learning Model for Programming Language Education)

  • 강환수
    • 디지털콘텐츠학회 논문지
    • /
    • 제13권4호
    • /
    • pp.517-524
    • /
    • 2012
  • 이 논문에서는 프로그래밍 언어 교육을 위한 교수 학습 모델을 설계한다. 다양한 학문을 다루는 대학에서 다양한 컴퓨터 프로그래밍 언어 관련 교과목이 개설되어 운영되고 있다. 그 동안 다양한 프로그래밍 언어가 개발되었고, 개발환경도 사용자가 보다 쉽게 접근할 수 있도록 개발되었으나 여전히 많은 초보 학습자들은 프로그래밍 언어 학습을 어려워하고 있으며, 마찬가지로 교수자도 효과적인 프로그래밍 언어 교육을 위한 적합한 교수 학습 방법을 마련하지 못하고 있는 실정이다. 본 논문에서 학업성취 기반의 블렌디드 교육인 프로그래밍 언어 교육을 위한 교수 학습 모델을 설계하였다. 본 연구에서 설계한 교수 학습 모델을 2011년 2학기 강좌에 적용한 결과 학습자의 프로그래밍 언어 교육에 효과적인 것으로 나타났다.

문장음성인식을 위한 VCCV 기반의 효율적인 언어모델 (Efficient Language Model based on VCCV unit for Sentence Speech Recognition)

  • 박선희;노용완;홍광석
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2003년도 학술회의 논문집 정보 및 제어부문 B
    • /
    • pp.836-839
    • /
    • 2003
  • In this paper, we implement a language model by a bigram and evaluate proper smoothing technique for unit of low perplexity. Word, morpheme, clause units are widely used as a language processing unit of the language model. We propose VCCV units which have more small vocabulary than morpheme and clauses units. We compare the VCCV units with the clause and the morpheme units using the perplexity. The most common metric for evaluating a language model is the probability that the model assigns the derivative measures of perplexity. Smoothing used to estimate probabilities when there are insufficient data to estimate probabilities accurately. In this paper, we constructed the N-grams of the VCCV units with low perplexity and tested the language model using Katz, Witten-Bell, absolute, modified Kneser-Ney smoothing and so on. In the experiment results, the modified Kneser-Ney smoothing is tested proper smoothing technique for VCCV units.

  • PDF

Towards a small language model powered chain-of-reasoning for open-domain question answering

  • Jihyeon Roh;Minho Kim;Kyoungman Bae
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.11-21
    • /
    • 2024
  • We focus on open-domain question-answering tasks that involve a chain-of-reasoning, which are primarily implemented using large language models. With an emphasis on cost-effectiveness, we designed EffiChainQA, an architecture centered on the use of small language models. We employed a retrieval-based language model to address the limitations of large language models, such as the hallucination issue and the lack of updated knowledge. To enhance reasoning capabilities, we introduced a question decomposer that leverages a generative language model and serves as a key component in the chain-of-reasoning process. To generate training data for our question decomposer, we leveraged ChatGPT, which is known for its data augmentation ability. Comprehensive experiments were conducted using the HotpotQA dataset. Our method outperformed several established approaches, including the Chain-of-Thoughts approach, which is based on large language models. Moreover, our results are on par with those of state-of-the-art Retrieve-then-Read methods that utilize large language models.

초등학교 교사를 위한 격려 언어 모형 개발 (Development of the Encouraging Language Model for Elementary School Teachers)

  • 선영운;오익수
    • 초등상담연구
    • /
    • 제10권1호
    • /
    • pp.39-56
    • /
    • 2011
  • 본 연구의 목적은 격려에 관한 선행 연구들로부터 격려의 방법에 대한 내용을 종합 정리하여 격려 언어의 요소를 도출하고, 이를 이론적 근거로 하여 초등학교 교사를 위한 격려 언어모형을 개발하는 데 있다. 이를 위해 먼저 격려에 관한 선행 연구들로부터 격려의 방법에 대한 내용들을 중심으로 자료를 수집하였다. 그 다음 수집된 자료들의 내용을 분석하여 자료에 나타난 주요 개념에 따라 범주화하였다. 이를 통해 17개의 하위 범주를 도출하였다. 이후 하위 범주를 대상으로 다시 범주화를 시도하여 5개의 상위 범주를 도출하였다. 5개의 상위범주는 존재 자체로서 아동의 가치 인정, 아동의 자질 신뢰, 실수에 대한 합리적 사고, 아동의 행동에 대한 비평가적 피드백, 아동의 긍정적 감정 반영이다. 이 5개의 상위 범주를 격려 언어의 요소로 설정하고, 이를 근거로 하여 학급 상황별로 사용할 수 있는 격려 언어 모형을 예문 중심으로 개발하였다.

  • PDF

Subword Neural Language Generation with Unlikelihood Training

  • Iqbal, Salahuddin Muhammad;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제12권2호
    • /
    • pp.45-50
    • /
    • 2020
  • A Language model with neural networks commonly trained with likelihood loss. Such that the model can learn the sequence of human text. State-of-the-art results achieved in various language generation tasks, e.g., text summarization, dialogue response generation, and text generation, by utilizing the language model's next token output probabilities. Monotonous and boring outputs are a well-known problem of this model, yet only a few solutions proposed to address this problem. Several decoding techniques proposed to suppress repetitive tokens. Unlikelihood training approached this problem by penalizing candidate tokens probabilities if the tokens already seen in previous steps. While the method successfully showed a less repetitive generated token, the method has a large memory consumption because of the training need a big vocabulary size. We effectively reduced memory footprint by encoding words as sequences of subword units. Finally, we report competitive results with token level unlikelihood training in several automatic evaluations compared to the previous work.