• Title/Summary/Keyword: Large language models

Search Result 164, Processing Time 0.026 seconds

Enhancing LoRA Fine-tuning Performance Using Curriculum Learning

  • Daegeon Kim;Namgyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.43-54
    • /
    • 2024
  • Recently, there has been a lot of research on utilizing Language Models, and Large Language Models have achieved innovative results in various tasks. However, the practical application faces limitations due to the constrained resources and costs required to utilize Large Language Models. Consequently, there has been recent attention towards methods to effectively utilize models within given resources. Curriculum Learning, a methodology that categorizes training data according to difficulty and learns sequentially, has been attracting attention, but it has the limitation that the method of measuring difficulty is complex or not universal. Therefore, in this study, we propose a methodology based on data heterogeneity-based Curriculum Learning that measures the difficulty of data using reliable prior information and facilitates easy utilization across various tasks. To evaluate the performance of the proposed methodology, experiments were conducted using 5,000 specialized documents in the field of information communication technology and 4,917 documents in the field of healthcare. The results confirm that the proposed methodology outperforms traditional fine-tuning in terms of classification accuracy in both LoRA fine-tuning and full fine-tuning.

Inducing Harmful Speech in Large Language Models through Korean Malicious Prompt Injection Attacks (한국어 악성 프롬프트 주입 공격을 통한 거대 언어 모델의 유해 표현 유도)

  • Ji-Min Suh;Jin-Woo Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.3
    • /
    • pp.451-461
    • /
    • 2024
  • Recently, various AI chatbots based on large language models have been released. Chatbots have the advantage of providing users with quick and easy information through interactive prompts, making them useful in various fields such as question answering, writing, and programming. However, a vulnerability in chatbots called "prompt injection attacks" has been proposed. This attack involves injecting instructions into the chatbot to violate predefined guidelines. Such attacks can be critical as they may lead to the leakage of confidential information within large language models or trigger other malicious activities. However, the vulnerability of Korean prompts has not been adequately validated. Therefore, in this paper, we aim to generate malicious Korean prompts and perform attacks on the popular chatbot to analyze their feasibility. To achieve this, we propose a system that automatically generates malicious Korean prompts by analyzing existing prompt injection attacks. Specifically, we focus on generating malicious prompts that induce harmful expressions from large language models and validate their effectiveness in practice.

Recent R&D Trends for Pretrained Language Model (딥러닝 사전학습 언어모델 기술 동향)

  • Lim, J.H.;Kim, H.K.;Kim, Y.K.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.9-19
    • /
    • 2020
  • Recently, a technique for applying a deep learning language model pretrained from a large corpus to fine-tuning for each application task has been widely used as a language processing technology. The pretrained language model shows higher performance and satisfactory generalization performance than existing methods. This paper introduces the major research trends related to deep learning pretrained language models in the field of language processing. We describe in detail the motivations, models, learning methods, and results of the BERT language model that had significant influence on subsequent studies. Subsequently, we introduce the results of language model studies after BERT, focusing on SpanBERT, RoBERTa, ALBERT, BART, and ELECTRA. Finally, we introduce the KorBERT pretrained language model, which shows satisfactory performance in Korean language. In addition, we introduce techniques on how to apply the pretrained language model to Korean (agglutinative) language, which consists of a combination of content and functional morphemes, unlike English (refractive) language whose endings change depending on the application.

Question Answering that leverage the inherent knowledge of large language models (거대 언어 모델의 내재된 지식을 활용한 질의 응답 방법)

  • Myoseop Sim;Kyungkoo Min;Minjun Park;Jooyoung Choi;Haemin Jung;Stanley Jungkyu Choi
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.31-35
    • /
    • 2023
  • 최근에는 질의응답(Question Answering, QA) 분야에서 거대 언어 모델(Large Language Models, LLMs)의 파라미터에 내재된 지식을 활용하는 방식이 활발히 연구되고 있다. Open Domain QA(ODQA) 분야에서는 기존에 정보 검색기(retriever)-독해기(reader) 파이프라인이 주로 사용되었으나, 최근에는 거대 언어 모델이 독해 뿐만 아니라 정보 검색기의 역할까지 대신하고 있다. 본 논문에서는 거대 언어 모델의 내재된 지식을 사용해서 질의 응답에 활용하는 방법을 제안한다. 질문에 대해 답변을 하기 전에 질문과 관련된 구절을 생성하고, 이를 바탕으로 질문에 대한 답변을 생성하는 방식이다. 이 방법은 Closed-Book QA 분야에서 기존 프롬프팅 방법 대비 우수한 성능을 보여주며, 이를 통해 대형 언어 모델에 내재된 지식을 활용하여 질의 응답 능력을 향상시킬 수 있음을 입증한다.

  • PDF

A Study of Fine Tuning Pre-Trained Korean BERT for Question Answering Performance Development (사전 학습된 한국어 BERT의 전이학습을 통한 한국어 기계독해 성능개선에 관한 연구)

  • Lee, Chi Hoon;Lee, Yeon Ji;Lee, Dong Hee
    • Journal of Information Technology Services
    • /
    • v.19 no.5
    • /
    • pp.83-91
    • /
    • 2020
  • Language Models such as BERT has been an important factor of deep learning-based natural language processing. Pre-training the transformer-based language models would be computationally expensive since they are consist of deep and broad architecture and layers using an attention mechanism and also require huge amount of data to train. Hence, it became mandatory to do fine-tuning large pre-trained language models which are trained by Google or some companies can afford the resources and cost. There are various techniques for fine tuning the language models and this paper examines three techniques, which are data augmentation, tuning the hyper paramters and partly re-constructing the neural networks. For data augmentation, we use no-answer augmentation and back-translation method. Also, some useful combinations of hyper parameters are observed by conducting a number of experiments. Finally, we have GRU, LSTM networks to boost our model performance with adding those networks to BERT pre-trained model. We do fine-tuning the pre-trained korean-based language model through the methods mentioned above and push the F1 score from baseline up to 89.66. Moreover, some failure attempts give us important lessons and tell us the further direction in a good way.

A Study on Keyword Spotting System Using Pseudo N-gram Language Model (의사 N-gram 언어모델을 이용한 핵심어 검출 시스템에 관한 연구)

  • 이여송;김주곤;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.242-247
    • /
    • 2004
  • Conventional keyword spotting systems use the connected word recognition network consisted by keyword models and filler models in keyword spotting. This is why the system can not construct the language models of word appearance effectively for detecting keywords in large vocabulary continuous speech recognition system with large text data. In this paper to solve this problem, we propose a keyword spotting system using pseudo N-gram language model for detecting key-words and investigate the performance of the system upon the changes of the frequencies of appearances of both keywords and filler models. As the results, when the Unigram probability of keywords and filler models were set to 0.2, 0.8, the experimental results showed that CA (Correctly Accept for In-Vocabulary) and CR (Correctly Reject for Out-Of-Vocabulary) were 91.1% and 91.7% respectively, which means that our proposed system can get 14% of improved average CA-CR performance than conventional methods in ERR (Error Reduction Rate).

Enhancement of a language model using two separate corpora of distinct characteristics

  • Cho, Sehyeong;Chung, Tae-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.357-362
    • /
    • 2004
  • Language models are essential in predicting the next word in a spoken sentence, thereby enhancing the speech recognition accuracy, among other things. However, spoken language domains are too numerous, and therefore developers suffer from the lack of corpora with sufficient sizes. This paper proposes a method of combining two n-gram language models, one constructed from a very small corpus of the right domain of interest, the other constructed from a large but less adequate corpus, resulting in a significantly enhanced language model. This method is based on the observation that a small corpus from the right domain has high quality n-grams but has serious sparseness problem, while a large corpus from a different domain has more n-gram statistics but incorrectly biased. With our approach, two n-gram statistics are combined by extending the idea of Katz's backoff and therefore is called a dual-source backoff. We ran experiments with 3-gram language models constructed from newspaper corpora of several million to tens of million words together with models from smaller broadcast news corpora. The target domain was broadcast news. We obtained significant improvement (30%) by incorporating a small corpus around one thirtieth size of the newspaper corpus.

An XPDL-Based Workflow Control-Structure and Data-Sequence Analyzer

  • Kim, Kwanghoon Pio
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1702-1721
    • /
    • 2019
  • A workflow process (or business process) management system helps to define, execute, monitor and manage workflow models deployed on a workflow-supported enterprise, and the system is compartmentalized into a modeling subsystem and an enacting subsystem, in general. The modeling subsystem's functionality is to discover and analyze workflow models via a theoretical modeling methodology like ICN, to graphically define them via a graphical representation notation like BPMN, and to systematically deploy those graphically defined models onto the enacting subsystem by transforming into their textual models represented by a standardized workflow process definition language like XPDL. Before deploying those defined workflow models, it is very important to inspect its syntactical correctness as well as its structural properness to minimize the loss of effectiveness and the depreciation of efficiency in managing the corresponding workflow models. In this paper, we are particularly interested in verifying very large-scale and massively parallel workflow models, and so we need a sophisticated analyzer to automatically analyze those specialized and complex styles of workflow models. One of the sophisticated analyzers devised in this paper is able to analyze not only the structural complexity but also the data-sequence complexity, especially. The structural complexity is based upon combinational usages of those control-structure constructs such as subprocesses, exclusive-OR, parallel-AND and iterative-LOOP primitives with preserving matched pairing and proper nesting properties, whereas the data-sequence complexity is based upon combinational usages of those relevant data repositories such as data definition sequences and data use sequences. Through the devised and implemented analyzer in this paper, we are able eventually to achieve the systematic verifications of the syntactical correctness as well as the effective validation of the structural properness on those complicate and large-scale styles of workflow models. As an experimental study, we apply the implemented analyzer to an exemplary large-scale and massively parallel workflow process model, the Large Bank Transaction Workflow Process Model, and show the structural complexity analysis results via a series of operational screens captured from the implemented analyzer.