• Title/Summary/Keyword: Large language models

Search Result 164, Processing Time 0.02 seconds

Utilizing Large Language Models for Non-trained Binary Sentiment Classification (거대 언어 모델(LLM)을 이용한 비훈련 이진 감정 분류)

  • Hyungjin Ahn;Taewook Hwang;Sangkeun Jung
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.66-71
    • /
    • 2023
  • ChatGPT가 등장한 이후 다양한 거대 언어 모델(Large Language Model, LLM)이 등장하였고, 이러한 LLM을 목적에 맞게 파인튜닝하여 사용할 수 있게 되었다. 하지만 LLM을 새로 학습하는 것은 물론이고, 단순 튜닝만 하더라도 일반인은 시도하기 어려울 정도의 많은 컴퓨팅 자원이 필요하다. 본 연구에서는 공개된 LLM을 별도의 학습 없이 사용하여 zero-shot 프롬프팅으로 이진 분류 태스크에 대한 성능을 확인하고자 했다. 학습이나 추가적인 튜닝 없이도 기존 선학습 언어 모델들에 준하는 이진 분류 성능을 확인할 수 있었고, 성능이 좋은 LLM의 경우 분류 실패율이 낮고 일관적인 성능을 보여 상당히 높은 활용성을 확인하였다.

  • PDF

Instruction Fine-tuning and LoRA Combined Approach for Optimizing Large Language Models (대규모 언어 모델의 최적화를 위한 지시형 미세 조정과 LoRA 결합 접근법)

  • Sang-Gook Kim;Kyungran Noh;Hyuk Hahn;Boong Kee Choi
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.2
    • /
    • pp.134-146
    • /
    • 2024
  • This study introduces and experimentally validates a novel approach that combines Instruction fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning to optimize the performance of Large Language Models (LLMs). These models have become revolutionary tools in natural language processing, showing remarkable performance across diverse application areas. However, optimizing their performance for specific domains necessitates fine-tuning of the base models (FMs), which is often limited by challenges such as data complexity and resource costs. The proposed approach aims to overcome these limitations by enhancing the performance of LLMs, particularly in the analysis precision and efficiency of national Research and Development (R&D) data. The study provides theoretical foundations and technical implementations of Instruction fine-tuning and LoRA fine-tuning. Through rigorous experimental validation, it is demonstrated that the proposed method significantly improves the precision and efficiency of data analysis, outperforming traditional fine-tuning methods. This enhancement is not only beneficial for national R&D data but also suggests potential applicability in various other data-centric domains, such as medical data analysis, financial forecasting, and educational assessments. The findings highlight the method's broad utility and significant contribution to advancing data analysis techniques in specialized knowledge domains, offering new possibilities for leveraging LLMs in complex and resource-intensive tasks. This research underscores the transformative potential of combining Instruction fine-tuning with LoRA fine-tuning to achieve superior performance in diverse applications, paving the way for more efficient and effective utilization of LLMs in both academic and industrial settings.

Technical Trends in Hyperscale Artificial Intelligence Processors (초거대 인공지능 프로세서 반도체 기술 개발 동향)

  • W. Jeon;C.G. Lyuh
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.5
    • /
    • pp.1-11
    • /
    • 2023
  • The emergence of generative hyperscale artificial intelligence (AI) has enabled new services, such as image-generating AI and conversational AI based on large language models. Such services likely lead to the influx of numerous users, who cannot be handled using conventional AI models. Furthermore, the exponential increase in training data, computations, and high user demand of AI models has led to intensive hardware resource consumption, highlighting the need to develop domain-specific semiconductors for hyperscale AI. In this technical report, we describe development trends in technologies for hyperscale AI processors pursued by domestic and foreign semiconductor companies, such as NVIDIA, Graphcore, Tesla, Google, Meta, SAPEON, FuriosaAI, and Rebellions.

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

Leveraging LLMs for Corporate Data Analysis: Employee Turnover Prediction with ChatGPT (대형 언어 모델을 활용한 기업데이터 분석: ChatGPT를 활용한 직원 이직 예측)

  • Sungmin Kim;Jee Yong Chung
    • Knowledge Management Research
    • /
    • v.25 no.2
    • /
    • pp.19-47
    • /
    • 2024
  • Organizational ability to analyze and utilize data plays an important role in knowledge management and decision-making. This study aims to investigate the potential application of large language models in corporate data analysis. Focusing on the field of human resources, the research examines the data analysis capabilities of these models. Using the widely studied IBM HR dataset, the study reproduces machine learning-based employee turnover prediction analyses from previous research through ChatGPT and compares its predictive performance. Unlike past research methods that required advanced programming skills, ChatGPT-based machine learning data analysis, conducted through the analyst's natural language requests, offers the advantages of being much easier and faster. Moreover, its prediction accuracy was found to be competitive compared to previous studies. This suggests that large language models could serve as effective and practical alternatives in the field of corporate data analysis, which has traditionally demanded advanced programming capabilities. Furthermore, this approach is expected to contribute to the popularization of data analysis and the spread of data-driven decision-making (DDDM). The prompts used during the data analysis process and the program code generated by ChatGPT are also included in the appendix for verification, providing a foundation for future data analysis research using large language models.

Exploring Narrative Intelligence in AI: Implications for the Evolution of Homo narrans (인공지능의 서사 지능 탐구 : 새로운 서사 생태계와 호모 나랜스의 진화)

  • Hochang Kwon
    • Trans-
    • /
    • v.16
    • /
    • pp.107-133
    • /
    • 2024
  • Narratives are fundamental to human cognition and social culture, serving as the primary means by which individuals and societies construct meaning, share experiences, and convey cultural and moral values. The field of artificial intelligence, which seeks to mimic human thought and behavior, has long studied story generation and story understanding, and today's Large Language Models are demonstrating remarkable narrative capabilities based on advances in natural language processing. This situation raises a variety of changes and new issues, but a comprehensive discussion of them is hard to find. This paper aims to provide a holistic view of the current state and future changes by exploring the intersections and interactions of human and AI narrative intelligence. This paper begins with a review of multidisciplinary research on the intrinsic relationship between humans and narrative, represented by the term Homo narrans, and then provide a historical overview of how narrative has been studied in the field of AI. This paper then explore the possibilities and limitations of narrative intelligence as revealed by today's Large Language Models, and present three philosophical challenges for understanding the implications of AI with narrative intelligence.

A Survey on Deep Learning-based Pre-Trained Language Models (딥러닝 기반 사전학습 언어모델에 대한 이해와 현황)

  • Sangun Park
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.11-29
    • /
    • 2022
  • Pre-trained language models are the most important and widely used tools in natural language processing tasks. Since those have been pre-trained for a large amount of corpus, high performance can be expected even with fine-tuning learning using a small number of data. Since the elements necessary for implementation, such as a pre-trained tokenizer and a deep learning model including pre-trained weights, are distributed together, the cost and period of natural language processing has been greatly reduced. Transformer variants are the most representative pre-trained language models that provide these advantages. Those are being actively used in other fields such as computer vision and audio applications. In order to make it easier for researchers to understand the pre-trained language model and apply it to natural language processing tasks, this paper describes the definition of the language model and the pre-learning language model, and discusses the development process of the pre-trained language model and especially representative Transformer variants.

Coding Helper for Python Beginners based on the Large Language Model(LLM) (대규모 언어 모델(LLM) 기반의 파이썬 입문자를 위한 코딩 도우미)

  • Se-Hoon Lee;Jeong-Bin Choi;Yong-Tae Baek;Sun-Ho Yoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.389-390
    • /
    • 2023
  • 본 논문에서는 파이썬 코딩 플랫폼에서의 LLM(Large Language Models)을 로직 및 문법 에러 확인, 디버깅 도구로 활용할 수 있는 시스템을 제안한다. 이 시스템은 사용자가 코딩 플랫폼에서 작성한 파이썬 코드와 함께 발생한 에러 문구 및 프롬프트를 LLM 모델에 입력함으로써 로직(문법) 에러를 식별하고 디버깅에 활용할 수 있다. 특히, 입문자를 고려해 프롬프트를 제한하여 사용의 편의성을 높인다. 이를 통해 파이썬 코딩 교육에서 입문자들의 학습 과정을 원활하게 진행할 수 있으며, 파이썬 코딩에 대한 진입 장벽을 낮출 수 있다.

  • PDF