• Title/Summary/Keyword: 한국어 감정분석

Search Result 76, Processing Time 0.019 seconds

Zero-Shot Readability Assessment of Korean ESG Reports using BERT (BERT를 활용한 한국어 지속가능경영 보고서의 제로샷 가독성 평가)

  • Son, Guijin;Yoon, Naeun;Lee, Kaeun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.456-459
    • /
    • 2022
  • 본 연구는 최근 자연어 인공지능 연구 동향에 발맞추어 사전 학습된 언어 인공지능을 활용한 의미론적 분석을 통해 국문 보고서의 가독성을 평가하는 방법론 두 가지를 제안한다. 연구진은 연구 과정에서 사전 학습된 언어 인공지능을 활용해 추가 학습 없이 문장을 임의의 벡터값으로 임베딩하고 이를 통해 1. 의미론적 복잡도 와 2. 내재적 감정 변동성 두 가지 지표를 추출한다. 나아가, 앞서 발견한 두 지표가 국문 보고서의 가독성과 정(+)의 상관관계에 있음을 확인하였다. 본 연구는 통사론적 분석과 레이블링 된 데이터에 크게 의존하던 기존의 가독성 평가 방법론으로 부터 탈피해, 별도의 학습 없이 기존 가독성 지표에 근사한다는 점에서 의미가 있다.

User Experience Evaluation of Menstrual Cycle Measurement Application Using Text Mining Analysis Techniques (텍스트 마이닝 분석 기법을 활용한 월경주기측정 애플리케이션 사용자 경험 평가)

  • Wookyung Jeong;Donghee Shin
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.1-31
    • /
    • 2023
  • This study conducted user experience evaluation by introducing various text mining techniques along with topic modeling techniques for mobile menstrual cycle measurement applications that are closely related to women's health and analyzed the results by combining them with a honeycomb model. To evaluate the user experience revealed in the menstrual cycle measurement application review, 47,117 Korean reviews of the menstrual cycle measurement application were collected. Topic modeling analysis was conducted to confirm the overall discourse on the user experience revealed in the review, and text network analysis was conducted to confirm the specific experience of each topic. In addition, sentimental analysis was conducted to understand the emotional experience of users. Based on this, the development strategy of the menstrual cycle measurement application was presented in terms of accuracy, design, monitoring, data management, and user management. As a result of the study, it was confirmed that the accuracy and monitoring function of the menstrual cycle measurement of the application should be improved, and it was observed that various design attempts were required. In addition, the necessity of supplementing personal information and the user's biometric data management method was also confirmed. By exploring the user experience (UX) of the menstrual cycle measurement application in-depth, this study revealed various factors experienced by users and suggested practical improvements to provide a better experience. It is also significant in that it presents a methodology by combines topic modeling and text network analysis techniques so that researchers can closely grasp vast amounts of review data in the process of evaluating user experiences.

Hotspot Analysis of Korean Twitter Sentiments (한국어 트위터 감정의 핫스팟 분석)

  • Lim, Joasang;Kim, Jinman
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.2
    • /
    • pp.233-243
    • /
    • 2015
  • A hotspot is a spatial pattern that properties or events of spaces are densely revealed in a particular area. Whereas location information is easily captured with increasing use of mobile devices, so is not our emotion unless asking directly through a survey. Tweet provides a good way of analyzing such spatial sentiment, but relevant research is hard to find. Therefore, we analyzed hotspots of emotion in the twitter using spatial autocorrelation. 10,142 tweets and related GPS data were extracted. Sentiment of tweets was classified into good or bad with a support vector machine algorithm. We used Moran's I and Getis-Ord $G_i^*$ for global and local spatial autocorrelation. Some hotspots were found significant and drawn on Seoul metropolitan area map. These results were found very similar to an earlier conducted official survey of happiness index.

Extracting Implicit Customer Viewpoints from Product Review Text (상품 평가 텍스트에 암시된 사용자 관점 추출)

  • Jang, Kyoungrok;Lee, Kangwook;Myaeng, Sung-Hyon
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.53-58
    • /
    • 2013
  • 온라인 소비자들은 amazon.com과 같은 온라인 상점 플랫폼에 상품 평가(리뷰: review) 글을 남김으로써 대상 상품에 대한 의견을 표현한다. 이러한 상품 리뷰는 다른 소비자들의 구매 결정에도 큰 영향을 끼친다는 관점에서 볼 때, 매우 중요한 정보원이라고 할 수 있다. 사람들이 남긴 의견 정보(opinion)를 자동으로 추출하거나 분석하고자 하는 연구인 감성 분석(sentiment analysis)분야에서 과거에 진행된 대다수의 연구들은 크게는 문서 단위에서 작게는 상품의 요소(aspect) 단위로 사용자들이 남긴 의견이 긍정적 혹은 부정적 감정을 포함하고 있는지 분석하고자 하였다. 이렇게 소비자들이 남긴 의견이 대상 상품 혹은 상품의 요소를 긍정적 혹은 부정적으로 판단했는지 여부를 판단하는 것이 유용한 경우도 있겠으나, 본 연구에서는 소비자들이 '어떤 관점'에서 대상 상품 혹은 상품의 요소를 평가했는지를 자동으로 추출하는 방법에 초점을 두었다. 본 연구에서는 형용사의 대표적인 성질 중 하나가 자신이 수식하는 명사의 속성에 값을 부여하는 것임에 주목하여, 수식된 명사의 속성을 추출하고자 하였고 이를 위해 WordNet을 사용하였다. 제안하는 방법의 효과를 검증하기 위해 3명의 평가자를 활용하여 실험을 하였으며 그 결과는 본 연구 방향이 감성분석에 있어 새로운 가능성을 열기에 충분하다는 것을 보여주었다.

  • PDF

A Study on the Reliability and Validity of a Korean translated Multidimensional Experiential Avoidance Questionnaire (한국어판 다차원적 체험회피 질문지의 신뢰도 및 타당도 연구)

  • Jung, Ji-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.1
    • /
    • pp.517-526
    • /
    • 2018
  • The purpose of this study was to translate and examine the reliability and validity of Multidimensional Experiential Avoidance Questionnaire (MEAQ) developed by $G{\acute{a}}mez$, Chmielewski, Kotov, Ruggero, and Watson. 285 college students completed the MEAQ. Exploratory factor analysis supported the six factor structure of the 50 items. Internal consistency of 50 items was .91. 315 college students completed the MEAQ. Confirmatory factor analysis confirmed six factor structure of 50 items. 275 students of them completed also Acceptance-Action Questionnaire II, White Bear Suppression Inventory, Toronto Alexithymia Scale, Neuroticism, avoidant coping, CES-D, Beck Anxiety Inventory, Psychological Well-Being Scale, Satisfaction with Life Scale. Correlations between MEAQ and these scales supported the convergent, discriminant, and criterion-related validity.

Extracting Multi-type Elements Consisting of Multi-words from Sentences (문장으로부터 여러 단어로 구성된 여러 유형의 요소 추출)

  • Yang, Seon;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.73-77
    • /
    • 2014
  • 문장을 대상으로 특정 응용 분야에 필요한 요소를 자동으로 추출하는 정보 추출(information extraction) 과제는 자연어 처리 및 텍스트 마이닝의 중요한 과제 중 하나이다. 특히 추출해야할 요소가 한 단어가 아닌 여러 단어로 구성된 경우 추출 과정에서 고려되어야할 부분이 크게 증가한다. 또한 추출 대상이 되는 요소의 유형 또한 여러 가지인데, 감정 분석 분야를 예로 들면 화자, 객체, 속성 등 여러 유형의 요소에 대한 분석이 필요하며, 비교 마이닝 분야를 예로 들면 비교 주체, 비교 상대, 비교 술어 등의 요소에 대한 분석이 필요하다. 본 논문에서는 각각 여러 단어로 구성될 수 있는 여러 유형의 요소를 동시에 추출하는 방법을 제안한다. 제안 방법은 구현이 매우 간단하다는 장점을 가지는데, 필요한 과정은 형태소 부착과 변환 기반 학습(transformation-based learning) 두 가지이며, 파싱 혹은 청킹 같은 별도의 전처리 과정도 거치지 않는다. 평가를 위해 제안 방법을 적용하여 비교 마이닝을 수행하였는데, 비교 문장으로부터 각자 여러 단어로 구성될 수 있는 세 가지 유형의 비교 요소를 자동 추출하였으며, 실험 결과 정확도 84.33%의 우수한 성능을 산출하였다.

  • PDF

Generating Sponsored Blog Texts through Fine-Tuning of Korean LLMs (한국어 언어모델 파인튜닝을 통한 협찬 블로그 텍스트 생성)

  • Bo Kyeong Kim;Jae Yeon Byun;Kyung-Ae Cha
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.3
    • /
    • pp.1-12
    • /
    • 2024
  • In this paper, we fine-tuned KoAlpaca, a large-scale Korean language model, and implemented a blog text generation system utilizing it. Blogs on social media platforms are widely used as a marketing tool for businesses. We constructed training data of positive reviews through emotion analysis and refinement of collected sponsored blog texts and applied QLoRA for the lightweight training of KoAlpaca. QLoRA is a fine-tuning approach that significantly reduces the memory usage required for training, with experiments in an environment with a parameter size of 12.8B showing up to a 58.8% decrease in memory usage compared to LoRA. To evaluate the generative performance of the fine-tuned model, texts generated from 100 inputs not included in the training data produced on average more than twice the number of words compared to the pre-trained model, with texts of positive sentiment also appearing more than twice as often. In a survey conducted for qualitative evaluation of generative performance, responses indicated that the fine-tuned model's generated outputs were more relevant to the given topics on average 77.5% of the time. This demonstrates that the positive review generation language model for sponsored content in this paper can enhance the efficiency of time management for content creation and ensure consistent marketing effects. However, to reduce the generation of content that deviates from the category of positive reviews due to elements of the pre-trained model, we plan to proceed with fine-tuning using the augmentation of training data.

The Characteristics of Malicious Comments: Comparisons of the Internet News Comments in Korean and English (악성 댓글의 특성: 한국어와 영어의 인터넷 뉴스 댓글 비교)

  • Kim, Young-il;Kim, Youngjun;Kim, Youngjin;Kim, Kyungil
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.1
    • /
    • pp.548-558
    • /
    • 2019
  • Along generalization of internet news comments, malicious comments have been spread and made many social problems. Because writings reflect human mental state or trait, analyzing malicious comments, human mental states could be inferred when they write internet news comments. In this study, we analyzed malicious comments of English and Korean speaker using LIWC and KLIWC. As a result, in both English and Korean, malicious comments are commonly more used in sentence, word phrase, morpheme, word phrase per sentence, morpheme per sentence, positive emotion words, and cognitive process words than normal comments, and less used in the third person singular, adjective, anger words, and emotional process words than normal comments. This means people are state that they can not control their feeling such as anger and can not think well when they write news comments. Therefore, when internet comments were written, service provider should consider the way that commenters monitor own writings by themselves and that they prevent the other users from getting close to comments included many negative-emotion words. In other sides, it is discovered that English and Korean malicious comments was discriminated by authenticity. In order to be more objective, gathering data from various point of time is needed.

Character Analysis of the Movie "THE HANDMAIDEN" (영화 [아가씨]의 악인형 분석)

  • Jeong, Moun-Kwon
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.1
    • /
    • pp.413-420
    • /
    • 2019
  • The purpose of this study is to analyze the main character characters of the movie [THE HANDMAIDEN]. All four characters in the movie [THE HANDMAIDEN] have in common is that they all have seen 'the Existence' and have the factors behind the villainous figure. Kozuki and Hideko had clear and typical symptoms, have been diagnosed as a structural layer of perversion and neurosis by Lacan's psychoanalysis Methodology. On the other hand, since Sook-hee and Ko-pandol have the nature of being a criminal, they have long been faced up to the existence. It was difficult to approach to the structural layer of psychoanlysis Methodology. Therefore, the PCL-R diagnosis used to analyze the personality type of Sook-hee and Ko-pandol and add structural analysis again. As a result, Kouzuki had an Sadist, Hideko had an obsessive compulsive and Sook-hee had an anti-social lifestyle, but she was in a normal emotional category. It was noted that the Ko-pandol was a potential Sadist and part of a sociopath.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.