• Title/Summary/Keyword: 텍스트 연구

Search Result 3,471, Processing Time 0.029 seconds

Studies on The Spread and Impact of Chinese Traditional Dramas and in South Korea (중국 전통 희곡 <서향기>와 <삼국지> 한국으로의 전파 및 한국에 끼친 영향)

  • Yuan, Guo
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.6
    • /
    • pp.79-86
    • /
    • 2019
  • Drama, as an artistic stage performance style of traditional life style, has built a bridge of communication between the two nationalities in China and Korea. However, the different performance procedures and cultural backgrounds of different countries make the theatrical arts of China and Korea present different stage manifestations. The dissemination and influence of the Chinese drama and in Korea are mainly studied in this paper. By using the methods of work analysis and text analysis, this paper further explores the evolution history of drama communication, the influence of Chinese drama on Korea, which embodies the cultural transmission, and explores the influence of communication and the acceptance degree of the audience. It has the significance and value of research, at the same time, it promotes the long-term development of national culture.

A Self-Guided Approach to Enhance Korean Text Generation in Writing Assistants (A Self-Guided Approach을 활용한 한국어 텍스트 생성 쓰기 보조 기법의 향상 방법)

  • Donghyeon Jang;Jinsu Kim;Minho Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.541-544
    • /
    • 2023
  • LLM(Largescale Language Model)의 성능 향상을 위한 비용 효율적인 방법으로 ChatGPT, GPT-4와 같은 초거대 모델의 output에 대해 SLM(Small Language Model)을 finetune하는 방법이 주목받고 있다. 그러나, 이러한 접근법은 주로 범용적인 지시사항 모델을 위한 학습 방법으로 사용되며, 제한된 특정 도메인에서는 추가적인 성능 개선의 여지가 있다. 본 연구는 특정 도메인(Writing Assistant)에서의 성능 향상을 위한 새로운 방법인 Self-Guided Approach를 제안한다. Self-Guided Approach는 (1) LLM을 활용해 시드 데이터에 대해 도메인 특화된 metric(유용성, 관련성, 정확성, 세부사항의 수준별) 점수를 매기고, (2) 점수가 매겨진 데이터와 점수가 매겨지지 않은 데이터를 모두 활용하여 supervised 방식으로 SLM을 미세 조정한다. Vicuna에서 제안된 평가 방법인, GPT-4를 활용한 자동평가 프레임워크를 사용하여 Self-Guided Approach로 학습된 SLM의 성능을 평가하였다. 평가 결과 Self-Guided Approach가 Self-instruct, alpaca와 같이, 생성된 instruction 데이터에 튜닝하는 기존의 훈련 방법에 비해 성능이 향상됨을 확인했다. 다양한 스케일의 한국어 오픈 소스 LLM(Polyglot1.3B, PolyGlot3.8B, PolyGlot5.8B)에 대해서 Self-Guided Approach를 활용한 성능 개선을 확인했다. 평가는 GPT-4를 활용한 자동 평가를 진행했으며, Korean Novel Generation 도메인의 경우, 테스트 셋에서 4.547점에서 6.286점의 성능 향상이 발생했으며, Korean scenario Genration 도메인의 경우, 테스트 셋에서 4.038점에서 5.795 점의 성능 향상이 발생했으며, 다른 유사 도메인들에서도 비슷한 점수 향상을 확인했다. Self-Guided Approach의 활용을 통해 특정 도메인(Writing Assistant)에서의 SLM의 성능 개선 가능성을 확인했으며 이는 LLM에 비용부담을 크게 줄이면서도 제한된 도메인에서 성능을 유지하며, LLM을 활용한 응용 서비스에 있어 실질적인 도움을 제공할 수 있을 것으로 기대된다.

  • PDF

A Study of the Algorithm that Standardizes Processing of Information and Taking Indications of East Asian Medicine Formula (비정형 한의약텍스트 조제복용사항 정형화알고리즘연구 - 동의보감 처방정보를 중심으로)

  • CHA Wung-seok;HEO Yo-seob;Kim Namil
    • The Journal of Korean Medical History
    • /
    • v.35 no.2
    • /
    • pp.45-67
    • /
    • 2022
  • Currently, there are about 20,000 or so known ancient medical texts from the East Asian medical traditions. Although the most famous texts are widely known, many texts still exist only as original manuscripts. We are interested exploring these texts to uncover the potential benefits of their therapeutic knowledge. This study aims to develop a database program that automatically converts the treatment skills described in the text version into a more structured version. In the previous study, our team analyzed patterns in the way that treatment skills are described and then tried to design a database program algorithm that identified every meaningful keyword used to describe treatment skills and put that word in the right cell of a structured table. This study continues the development of this program. East Asian medical herbal treatment information is broken down into 4 elements: the first one is the name or title of treatment skills, and the second is the symptoms to which the treatment is applied, the third is ingredients used, the fourth is how information is processed and the indications taken. This study presents the algorithm's principles on how to analyze and structure the fourth element, the processing of information and taking of indications, which is described in a form of ancient natural language.

Multi-Emotion Regression Model for Recognizing Inherent Emotions in Speech Data (음성 데이터의 내재된 감정인식을 위한 다중 감정 회귀 모델)

  • Moung Ho Yi;Myung Jin Lim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.81-88
    • /
    • 2023
  • Recently, communication through online is increasing due to the spread of non-face-to-face services due to COVID-19. In non-face-to-face situations, the other person's opinions and emotions are recognized through modalities such as text, speech, and images. Currently, research on multimodal emotion recognition that combines various modalities is actively underway. Among them, emotion recognition using speech data is attracting attention as a means of understanding emotions through sound and language information, but most of the time, emotions are recognized using a single speech feature value. However, because a variety of emotions exist in a complex manner in a conversation, a method for recognizing multiple emotions is needed. Therefore, in this paper, we propose a multi-emotion regression model that extracts feature vectors after preprocessing speech data to recognize complex, inherent emotions and takes into account the passage of time.

Analysis of Meta Fashion Meaning Structure using Big Data: Focusing on the keywords 'Metaverse' + 'Fashion design' (빅데이터를 활용한 메타패션 의미구조 분석에 관한 연구: '메타버스' + '패션디자인' 키워드를 중심으로)

  • Ji-Yeon Kim;Shin-Young Lee
    • Fashion & Textile Research Journal
    • /
    • v.25 no.5
    • /
    • pp.549-559
    • /
    • 2023
  • Along with the transition to the fourth industrial revolution, the possibility of metaverse-based innovation in the fashion field has been confirmed, and various applications are being sought. Therefore, this study performs meaning structure analysis and discusses the prospects of meta fashion using big data. From 2020 to 2022, data including the keyword "metaverse + fashion design" were collected from portal sites (Naver, Daum, and Google), and the results of keyword frequency, N-gram, and TF-IDF analyses were derived using text mining. Furthermore, network visualization and CONCOR analysis were performed using Ucinet 6 to understand the interconnected structure between keywords and their essential meanings. The results were as follows: The main keywords appeared in the following order: fashion, metaverse, design, 3D, platform, apparel, and virtual. In the N-gram analysis, the density between fashion and metaverse words was high, and in the TF-IDF analysis results, the importance of content- and technology-related words such as 3D, apparel, platform, NFT, education, AI, avatar, MCM, and meta-fashion was confirmed. Through network visualization and CONCOR analysis using Ucinet 6, three cluster results were derived from the top emerging words: "metaverse fashion design and industry," "metaverse fashion design and education," and "metaverse fashion design platform." CONCOR analysis was also used to derive differentiated analysis results for middle and lower words. The results of this study provide useful information to strengthen competitiveness in the field of metaverse fashion design.

A Study on the Impact of Speech Data Quality on Speech Recognition Models

  • Yeong-Jin Kim;Hyun-Jong Cha;Ah Reum Kang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.41-49
    • /
    • 2024
  • Speech recognition technology is continuously advancing and widely used in various fields. In this study, we aimed to investigate the impact of speech data quality on speech recognition models by dividing the dataset into the entire dataset and the top 70% based on Signal-to-Noise Ratio (SNR). Utilizing Seamless M4T and Google Cloud Speech-to-Text, we examined the text transformation results for each model and evaluated them using the Levenshtein Distance. Experimental results revealed that Seamless M4T scored 13.6 in models using data with high SNR, which is lower than the score of 16.6 for the entire dataset. However, Google Cloud Speech-to-Text scored 8.3 on the entire dataset, indicating lower performance than data with high SNR. This suggests that using data with high SNR during the training of a new speech recognition model can have an impact, and Levenshtein Distance can serve as a metric for evaluating speech recognition models.

A Study on Speech Synthesizer Using Distributed System (분산형 시스템을 적용한 음성합성에 관한 연구)

  • Kim, Jin-Woo;Min, So-Yeon;Na, Deok-Su;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.209-215
    • /
    • 2010
  • Recently portable terminal is received attention by wireless networks and mass capacity ROM. In this result, TTS(Text to Speech) system is inserted to portable terminal. Nevertheless high quality synthesis is difficult in portable terminal, users need high quality synthesis. In this paper, we proposed Distributed TTS (DTTS) that was composed of server and terminal. The DTTS on corpus based speech synthesis can be high quality synthesis. Synthesis system in server that generate optimized speech concatenation information after database search and transmit terminal. Synthesis system in terminal make high quality speech synthesis as low computation using transmitted speech concatenation information from server. The proposed method that can be reducing complexity, smaller power consumption and efficient maintenance.

A Text Mining Study on Endangered Wildlife Complaints - Discovery of Key Issues through LDA Topic Modeling and Network Analysis - (멸종위기 야생생물 민원 텍스트 마이닝 연구 - LDA 토픽 모델링과 네트워크 분석을 통한 주요 이슈 발굴 -)

  • Kim, Na-Yeong;Nam, Hee-Jung;Park, Yong-Su
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.6
    • /
    • pp.205-220
    • /
    • 2023
  • This study aimed to analyze the needs and interests of the public on endangered wildlife using complaint big data. We collected 1,203 complaints and their corresponding text data on endangered wildlife, pre-processed them, and constructed a document-term matrix for 1,739 text data. We performed LDA (Latent Dirichlet Allocation) topic modeling and network analysis. The results revealed that the complaints on endangered wildlife peaked in June-August, and the interest shifted from insects to various endangered wildlife in the living area, such as mammals, birds, and amphibians. In addition, the complaints on endangered wildlife could be categorized into 8 topics and 5 clusters, such as discovery report, habitat protection and response request, information inquiry, investigation and action request, and consultation request. The co-occurrence network analysis for each topic showed that the keywords reflecting the call center reporting procedure, such as photo, send, and take, had high centrality in common, and other keywords such as dung beetle, know, absence and think played an important role in the network. Through this analysis, we identified the main keywords and their relationships within each topic and derived the main issues for each topic. This study confirmed the increasing and diversifying public interest and complaints on endangered wildlife and highlighted the need for professional response. We also suggested developing and extending participatory conservation plans that align with the public's preferences and demands. This study demonstrated the feasibility of using complaint big data on endangered wildlife and its implications for policy decision-making and public promotion on endangered wildlife.

Exploring the phenomenon of veganphobia in vegan food and vegan fashion (비건 음식과 비건 패션에서 나타난 비건포비아 현상에 대한 탐구)

  • Yeong-Hyeon Choi;Sangyung Lee
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.3
    • /
    • pp.381-397
    • /
    • 2024
  • This study investigates the negative perceptions (veganphobia) held by consumers toward vegan diets and fashion and aims to foster a genuine acceptance of ethical veganism in consumption. The textual data web-crawled Korean online posts, including news articles, blogs, forums, and tweets, containing keywords such as "contradiction," "dilemma," "conflict," "issues," "vegan food" and "vegan fashion" from 2013 to 2021. Data analysis was conducted through text mining, network analysis, and clustering analysis using Python and NodeXL programs. The analysis revealed distinct negative perceptions regarding vegan food. Key issues included the perception of hypocrisy among vegetarians, associations with specific political leanings, conflicts between environmental and animal rights, and contradictions between views on companion animals and livestock. Regarding the vegan fashion industry, the eco-friendliness of material selection and design processes were seen as the pivotal factors shaping negative attitudes. Furthermore, the study identified a shared negative perception regarding vegan food and vegan fashion. This negativity was characterized by confusion and conflicts between animal and environmental rights, biased perceptions linked to specific political affiliations, perceived self-righteousness among vegetarians, and general discomfort toward them. These factors collectively contributed to a broader negative perception of vegan consumption. In conclusion, this study is significant in understanding the complex perceptions and attitudes that con- sumers hold toward vegan food and fashion. The insights gained from this research can aid in the design of more effective campaign strategies aimed at promoting vegan consumerism, ultimately contributing to a more widespread acceptance of ethical veganism in society.

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.