• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.032 seconds

Llama2 Cross-lingual Korean with instruction and translation datasets (지시문 및 번역 데이터셋을 활용한 Llama2 Cross-lingual 한국어 확장)

  • Gyu-sik Jang;;Seung-Hoon Na;Joon-Ho Lim;Tae-Hyeong Kim;Hwi-Jung Ryu;Du-Seong Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.627-632
    • /
    • 2023
  • 대규모 언어 모델은 높은 연산 능력과 방대한 양의 데이터를 기반으로 탁월한 성능을 보이며 자연어처리 분야의 주목을 받고있다. 이러한 모델들은 다양한 언어와 도메인의 텍스트를 처리하는 능력을 갖추게 되었지만, 전체 학습 데이터 중에서 한국어 데이터의 비중은 여전히 미미하다. 결과적으로 이는 대규모 언어 모델이 영어와 같은 주요 언어들에 비해 한국어에 대한 이해와 처리 능력이 상대적으로 부족함을 의미한다. 본 논문은 이러한 문제점을 중심으로, 대규모 언어 모델의 한국어 처리 능력을 향상시키는 방법을 제안한다. 특히, Cross-lingual transfer learning 기법을 활용하여 모델이 다양한 언어에 대한 지식을 한국어로 전이시켜 성능을 향상시키는 방안을 탐구하였다. 이를 통해 모델은 기존의 다양한 언어에 대한 손실을 최소화 하면서도 한국어에 대한 처리 능력을 상당히 향상시켰다. 실험 결과, 해당 기법을 적용한 모델은 기존 모델 대비 nsmc데이터에서 2배 이상의 성능 향상을 보이며, 특히 복잡한 한국어 구조와 문맥 이해에서 큰 발전을 보였다. 이러한 연구는 대규모 언어 모델을 활용한 한국어 적용 향상에 기여할 것으로 기대 된다.

  • PDF

Comparative analysis of large language model Korean quality based on zero-shot learning (Zero-shot learning 기반 대규모 언어 모델 한국어 품질 비교 분석)

  • Yuna Hur;Aram So;Taemin Lee;Joongmin Shin;JeongBae Park;Kinam Park;Sungmin Ahn;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.722-725
    • /
    • 2023
  • 대규모 언어 모델(LLM)은 대규모의 데이터를 학습하여 얻은 지식을 기반으로 텍스트와 다양한 콘텐츠를 인식하고 요약, 번역, 예측, 생성할 수 있는 딥러닝 알고리즘이다. 초기 공개된 LLM은 영어 기반 모델로 비영어권에서는 높은 성능을 기대할 수 없었으며, 이에 한국, 중국 등 자체적 LLM 연구개발이 활성화되고 있다. 본 논문에서는 언어가 LLM의 성능에 영향을 미치는가에 대하여 한국어 기반 LLM과 영어 기반 LLM으로 KoBEST의 4가지 Task에 대하여 성능비교를 하였다. 그 결과 한국어에 대한 사전 지식을 추가하는 것이 LLM의 성능에 영향을 미치는 것을 확인할 수 있었다.

  • PDF

A Comparative Study on the Types and its Importance of Trade Claims between China and the United States: Using Text Mining Techniques (중국과 미국의 무역클레임 유형과 중요도 비교 연구 : 텍스트 마이닝 기법을 활용하여)

  • Cheon Yu;Yun-Seop Hwang
    • Korea Trade Review
    • /
    • v.47 no.3
    • /
    • pp.177-190
    • /
    • 2022
  • This study is designed to identify the differences in the types and importance of trade claims at the national level. For analysis data, abstracts of arbitration and court judgments published on the website of the United Nations Commission on International Trade Law are collected and used. The target countries are China and the United States, with 102 cases from China and 59 cases from the United States. By applying topic modeling techniques to the collection decisions of China and the United States, trade claims are categorized, and the importance of each type is identified using the network centrality index derived through semantic network analysis. The analysis results are as follows. First, the main types of trade claims were the same for both the United States and China: product nonconformity, delivery issues, and payments. However, in China, the order of product nonconformity > delivery issues > payments was important, and in the United States, payments > product nonconformity > delivery issues were found to be important. This study is significant in that it presents a strategic trade claim management plan using a quantitative methodology.

A Study on the Failure Experiences of Online Fashion Shopping Mall Startups -Applying Text Mining and Grounded Theory- (온라인 패션 쇼핑몰 창업의 실패 경험에 관한 연구 -텍스트 마이닝과 근거이론을 적용하여-)

  • Min Jeong Seo
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.6
    • /
    • pp.1096-1112
    • /
    • 2023
  • Many entrepreneurs who launched online fashion shopping malls faced failure compared to those who achieved success. Recognizing the importance of research that reflects reality, this study explores entrepreneurs' experiences during the failure process of online fashion shopping malls. Two studies utilized YouTube videos documenting such online fashion shopping malls' failure. Study 1 employed text mining techniques, including high-frequency analysis and topic modeling, while Study 2 used a qualitative research method, specifically grounded theory. Study 1 identified the prominent experiences of operating online fashion shopping malls, while Study 2 provided a holistic perspective on the failure processes. The integrated findings from both studies highlight that entrepreneurs' passion for fashion motivates them to establish online fashion shopping malls, yet they encounter numerous challenges during the operational process. Insufficient business preparation and operational capabilities contribute to their failure to achieve financial goals. Despite efforts to boost sales and profit, entrepreneurs often close their businesses due to inadequate funds and waning motivation. The outcomes of this study can inform us about the operational challenges faced by online fashion shopping malls and offer valuable insights for developing new strategies to sustain and improve them.

A Study on Perceptions of Virtual Influencers through YouTube Comments -Focusing on Positive and Negative Emotional Responses Toward Character Design- (유튜브 댓글을 통해 살펴본 버추얼 인플루언서에 대한 인식 연구 -캐릭터 디자인에 대한 긍부정 감성 반응을 중심으로-)

  • Hyosun An;Jiyoung Kim
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.5
    • /
    • pp.873-890
    • /
    • 2023
  • This study analyzed users' emotional responses to VI character design through YouTube comments. The researchers applied text-mining to analyze 116,375 comments, focusing on terms related to character design and characteristics of VI. Using the BERT model in sentiment analysis, we classified comments into extremely negative, negative, neutral, positive, or extremely positive sentiments. Next, we conducted a co-occurrence frequency analysis on comments with extremely negative and extremely positive responses to examine the semantic relationships between character design and emotional characteristic terms. We also performed a content analysis of comments about Miquela and Shudu to analyze the perception differences regarding the two character designs. The results indicate that form elements (e.g., voice, face, and skin) and behavioral elements (e.g., speaking, interviewing, and reacting) are vital in eliciting users' emotional responses. Notably, in the negative responses, users focused on the humanization aspect of voice and the authenticity aspect of behavior in speaking, interviewing, and reacting. Furthermore, we found differences in the character design elements and characteristics that users expect based on the VI's field of activity. As a result, this study suggests applications to character design to accommodate these variations.

Text-Mining Analysis of Korea Government R&D Trends in Construction Machinery Domains (텍스트 마이닝을 통한 건설기계분야 국내 정부 R&D 연구동향 분석)

  • Bom Yun;Joonsoo Bae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.spc
    • /
    • pp.1-8
    • /
    • 2023
  • To investigate the national science and technology policy direction in the field of construction machinery, an analysis was conducted on projects selected as national research and development (R&D) initiatives by the government. Assuming that the project titles contain key keywords, text mining was employed to substantiate this assumption. Project information data spanning nine years from 2014 to 2022 was collected through the National Science & Technology Information Service (NTIS). To observe changes over time, the years were divided into three-year sections. To analyze research trends efficiently, keywords were categorized into groups: 'equipment,' 'smart,' and 'eco-friendly.' Based on the collected data, keyword frequency analysis, N-gram analysis, and topic modeling were performed. The research findings indicate that domestic government R&D in the construction machinery field primarily focuses on smart-related research and development. Specifically, investments in monitoring systems and autonomous operation technologies are increasing. This study holds significance in analyzing objective research trends through the utilization of big data analysis techniques and is expected to contribute to future research and development planning, strategic formulation, and project management.

Understanding of the Overview of Quality 4.0 Using Text Mining (텍스트마이닝을 활용한 품질 4.0 연구동향 분석)

  • Kim, Minjun
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.3
    • /
    • pp.403-418
    • /
    • 2023
  • Purpose: The acceleration of technological innovation, specifically Industry 4.0, has triggered the emergence of a quality management paradigm known as Quality 4.0. This study aims to provide a systematic overview of dispersed studies on Quality 4.0 across various disciplines and to stimulate further academic discussions and industrial transformations. Methods: Text mining and machine learning approaches are applied to learn and identify key research topics, and the suggested key references are manually reviewed to develop a state-of-the-art overview of Quality 4.0. Results: 1) A total of 27 key research topics were identified based on the analysis of 1234 research papers related to Quality 4.0. 2) A relationship among the 27 key research topics was identified. 3) A multilevel framework consisting of technological enablers, business methods and strategies, goals, application industries of Quality 4.0 was developed. 4) The trends of key research topics was analyzed. Conclusion: The identification of 27 key research topics and the development of the Quality 4.0 framework contribute to a better understanding of Quality 4.0. This research lays the groundwork for future academic and industrial advancements in the field and encourages further discussions and transformations within the industry.

A Study on the Effective Command Delivery of Commanders Using Speech Recognition Technology (국방 분야에서 전장 소음 환경 하에 음성 인식 기술 연구)

  • Yeong-hoon Kim;Hyun Kwon
    • Convergence Security Journal
    • /
    • v.24 no.2
    • /
    • pp.161-165
    • /
    • 2024
  • Recently, speech recognition models have been advancing, accompanied by the development of various speech processing technologies to obtain high-quality data. In the defense sector, efforts are being made to integrate technologies that effectively remove noise from speech data in noisy battlefield situations and enable efficient speech recognition. This paper proposes a method for effective speech recognition in the midst of diverse noise in a battlefield scenario, allowing commanders to convey orders. The proposed method involves noise removal from noisy speech followed by text conversion using OpenAI's Whisper model. Experimental results show that the proposed method reduces the Character Error Rate (CER) by 6.17% compared to the existing method that does not remove noise. Additionally, potential applications of the proposed method in the defense are discussed.

A Study on Recognition of Robot Barista Using Social Media Text Mining (소셜미디어 텍스트마이닝을 활용한 로봇 바리스타 인식 탐색 연구)

  • Han Jangheon;An Kabsoo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.37-47
    • /
    • 2024
  • The food tech market, which uses artificial intelligence robots for the restaurant industry, is gradually expanding. Among them, the robot barista, a representative food tech case for the restaurant industry, is characterized by increasing the efficiency of operators and providing things for visitors to see and enjoy through a 24-hour unmanned operation. This research was conducted through text mining analysis to examine trends related to robot baristas in the restaurant industry. The research results are as follows. First, keywords such as coffee, cafe, certification, ordering, taste, interest, people, robot cafe, coffee barista expert, free, course, unmanned, and wine sommelier were highly frequent. Second, time, variety, possibility, people, process, operation, service, and thought showed high closeness centrality. Third, as a result of CONCOR analysis, a total of 5 keyword clusters with high relevance to the restaurant industry were formed. In order to activate robot barista in the future, it is necessary to pay more attention to functional development that can strengthen its functions and features, as well as online promotion through various events and SNS in the robot barista cafe.

A Study of Consumer Perception on Freediving Suits Utilizing Big Data Analysis (빅데이터 분석을 활용한 프리다이빙 슈트에 대한 소비자 인식 연구)

  • Ji-Eun Kim;Eunyoung Lee
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.26 no.2
    • /
    • pp.87-99
    • /
    • 2024
  • Freediving, an underwater leisure sport that involves diving without the use of a breathing apparatus, has gained popularity among younger demographics through the viral spread of images and videos on social media platforms. This study employs prominent Big Data analysis techniques, including text mining, Latent Dirichlet Allocation (LDA) topic analysis, and opinion mining to explore the keywords associated with freediving suits over the past five years. The research aims to analyze the rapidly evolving market trends of freediving suits and the increasingly complex and diverse consumer perceptions to provide foundational data for activating the freediving suit market and developing strategies for sustained growth. The study identified the keyword 'size' related to freediving suits and conducted opinion mining on 'freediving suit sizes'. Although the results showed a higher positive than negative sentiment, negative keywords were also extracted, indicating the need to understand and mitigate the negative factors associated with 'size'. The findings offer vital guidelines for the advancement of the freediving suit market and enhancing consumer satisfaction. This study is important as it contributes foundational data for continuous growth strategies of the freediving suit market.