• Title/Summary/Keyword: Comment Classification

Search Result 20, Processing Time 0.028 seconds

User Characterization from Replying Comment Structures in Online Discussion (온라인 토론의 댓글 응답 구조를 이용한 사용자 특성 분석)

  • Kim, Sung-Hwan;Tak, Haesung;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.135-145
    • /
    • 2018
  • In online communities, users use comments to exchange their opinions and feelings on various subjects. Communication based on comments is quick and convenient, but sometimes this light-weight characteristic makes users use impolite and aggressive words, which leads to an online conflict. Therefore, it is important to analyze and classify users according to their characteristics in order to predict and take action for this kind of troubles. In this paper, we present several quantitative measures for describing the structures of comments trees based on the assumption that the user characteristics be observed as a form of some structural feature in comment trees of articles in which they posted comments. We examine the distribution of the proposed measures over article posters and commenters, and in addition, we show the effectiveness of the presented structural features by conducting experiments to classify users who have received warnings of the administrator from benign users.

Comments Classification System using Topic Signature (Topic Signature를 이용한 댓글 분류 시스템)

  • Bae, Min-Young;Cha, Jeong-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.12
    • /
    • pp.774-779
    • /
    • 2008
  • In this work, we describe comments classification system using topic signature. Topic signature is widely used for selecting feature in document classification and summarization. Comments are short and have so many word spacing errors, special characters. We firstly convert comments into 7-gram. We consider the 7-gram as sentence. We convert the 7-gram into 3-gram. We consider the 3-gram as word. We select key feature using topic signature and classify new inputs by the Naive Bayesian method. From the result of experiments, we can see that the proposed method is outstanding over the previous methods.

Comments Classification System using Support Vector Machines and Topic Signature (지지 벡터 기계와 토픽 시그너처를 이용한 댓글 분류 시스템 언어에 독립적인 댓글 분류 시스템)

  • Bae, Min-Young;En, Ji-Hyun;Jang, Du-Sung;Cha, Jeong-Won
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.263-266
    • /
    • 2009
  • Comments are short and not use spacing words or comma more than general document. We convert the 7-gram into 3-gram and select key features using topic signature. Topic signature is widely used for selecting features in document classification and summarization. We use the SVM(Support Vector Machines) as a classifier. From the result of experiments, we can see that the proposed method is outstanding over the previous methods. The proposed system can also apply to other languages.

  • PDF

Multi-Label Classification Approach to Effective Aspect-Mining (효과적인 애스팩트 마이닝을 위한 다중 레이블 분류접근법)

  • Jong Yoon Won;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.3
    • /
    • pp.81-97
    • /
    • 2020
  • Recent trends in sentiment analysis have been focused on applying single label classification approaches. However, when considering the fact that a review comment by one person is usually composed of several topics or aspects, it would be better to classify sentiments for those aspects respectively. This paper has two purposes. First, based on the fact that there are various aspects in one sentence, aspect mining is performed to classify the emotions by each aspect. Second, we apply the multiple label classification method to analyze two or more dependent variables (output values) at once. To prove our proposed approach's validity, online review comments about musical performances were garnered from domestic online platform, and the multi-label classification approach was applied to the dataset. Results were promising, and potentials of our proposed approach were discussed.

The Blog Polarity Classification Technique using Opinion Mining (오피니언 마이닝을 활용한 블로그의 극성 분류 기법)

  • Lee, Jong-Hyuk;Lee, Won-Sang;Park, Jea-Won;Choi, Jae-Hyun
    • Journal of Digital Contents Society
    • /
    • v.15 no.4
    • /
    • pp.559-568
    • /
    • 2014
  • Previous polarity classification using sentiment analysis utilizes a sentence rule by product reviews based rating points. It is difficult to be applied to blogs which have not rating of product reviews and is possible to fabricate product reviews by comment part-timers and managers who use web site so it is not easy to understand a product and store reviews which are reliability. Considering to these problems, if we analyze blogs which have personal and frank opinions and classify polarity, it is possible to understand rightly opinions for the product, store. This paper suggests that we extract high frequency vocabularies in blogs by several domains and choose topic words. Then we apply a technique of sentiment analysis and classify polarity about contents of blogs. To evaluate performances of sentiment analysis, we utilize the measurement index that use Precision, Recall, F-Score in an information retrieval field. In a result of evaluation, using suggested sentiment analysis is the better performances to classify polarity than previous techniques of using the sentence rule based product reviews.

A Study on Fine-Tuning and Transfer Learning to Construct Binary Sentiment Classification Model in Korean Text (한글 텍스트 감정 이진 분류 모델 생성을 위한 미세 조정과 전이학습에 관한 연구)

  • JongSoo Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.5
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, generative models based on the Transformer architecture, such as ChatGPT, have been gaining significant attention. The Transformer architecture has been applied to various neural network models, including Google's BERT(Bidirectional Encoder Representations from Transformers) sentence generation model. In this paper, a method is proposed to create a text binary classification model for determining whether a comment on Korean movie review is positive or negative. To accomplish this, a pre-trained multilingual BERT sentence generation model is fine-tuned and transfer learned using a new Korean training dataset. To achieve this, a pre-trained BERT-Base model for multilingual sentence generation with 104 languages, 12 layers, 768 hidden, 12 attention heads, and 110M parameters is used. To change the pre-trained BERT-Base model into a text classification model, the input and output layers were fine-tuned, resulting in the creation of a new model with 178 million parameters. Using the fine-tuned model, with a maximum word count of 128, a batch size of 16, and 5 epochs, transfer learning is conducted with 10,000 training data and 5,000 testing data. A text sentiment binary classification model for Korean movie review with an accuracy of 0.9582, a loss of 0.1177, and an F1 score of 0.81 has been created. As a result of performing transfer learning with a dataset five times larger, a model with an accuracy of 0.9562, a loss of 0.1202, and an F1 score of 0.86 has been generated.

An Understanding of Elementary School Students on the Acid-Base, Acid Rain and Soil Acidification (초등학생들의 산-염기, 산성비, 토양산성화에 대한 이해)

  • KIM, Sung-Kyu
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.27 no.6
    • /
    • pp.1764-1782
    • /
    • 2015
  • The purpose of this study is to investigate the understanding on the acid-base, acid rain and soil acidification of the elementary students. The participants in the current study were 280 6th graders from a elementary school in Gyeongnam Province. A questionnaire consists of four categories: understanding of (a) acid-base basic knowledge, (b) acid rain and (c) soil acidification. (d) In addition, students were asked to comment about the introduction of the acid rain experiment in the science textbook. The results are as follows; First, the results regarding acid-base basic knowledge. They know the classification, characteristics, and properties of acid-based solutions well but they don't know the acid-base neutralization, examples using properties and application in real life. Second, the results regarding acid rain, students know the definition and damage of acid rain but they don't know the causing substances, emission source and way of solution of acid rain for lack of knowledge. Third, the results regarding soil acidification was the well-known part for the students because they had continued learning about the soil from the lesson of acid rain. Also, we looked into the difference in gender and region about the understanding of acid-base, acid rain and soil acidification. According to the gender of the data about the understanding of acid-base, acid rain and soil acidification, the percentage of correct answers of female was higher than male's. Also we expected that urban students were higher than rural students on the understanding of acid-base, acid rain and soil acidification, but the understanding of urban students were similar to rural students. Fourth, we got positive answers and negative answers to the introduction of acid rain experiment. Most of the positive opinion were I want to know a lot acid rain experiment", followed by "It is possible to prevent the risk of the damage and It seems to having fun and new order. Most of the negative opinion were Acid rain experiment may be difficult and complicated followed by Just a theory in the book is enough, Acid rain experiment were boring and not fun, Acid rain experiment is dangerous, There are many to study in this order.

Improved Feature Extraction Method for the Contents Polluter Detection in Social Networking Service (SNS에서 콘텐츠 오염자 탐지를 위한 개선된 특징 추출 방법)

  • Han, Jin Seop;Park, Byung Joon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.11
    • /
    • pp.47-54
    • /
    • 2015
  • The number of users of SNS such as Twitter and Facebook increases due to the development of internet and the spread of supply of mobile devices such as smart phone. Moreover, there are also an increasing number of content pollution problems that pollute SNS by posting a product advertisement, defamatory comment and adult contents, and so on. This paper proposes an improved method of extracting the feature of content polluter for detecting a content polluter in SNS. In particular, this paper presents a method of extracting the feature of content polluter on the basis of incremental approach that considers only increment in data, not batch processing system of entire data in order to efficiently extract the feature value of new user data at the stage of predicting and classifying a content polluter. And it comparatively assesses whether the proposed method maintains classification accuracy and improves time efficiency in comparison with batch processing method through experiment.

Blurring of Swear Words in Negative Comments through Convolutional Neural Network (컨볼루션 신경망 모델에 의한 악성 댓글 모자이크처리 방안)

  • Kim, Yumin;Kang, Hyobin;Han, Suhyun;Jeong, Hieyong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.2
    • /
    • pp.25-34
    • /
    • 2022
  • With the development of online services, the ripple effect of negative comments is increasing, and the damage of cyber violence is rising. Various methods such as filtering based on forbidden words and reporting systems prevent this, but it is challenging to eradicate negative comments. Therefore, this study aimed to increase the accuracy of the classification of negative comments using deep learning and blur the parts corresponding to profanity. Two different conditional training helped decide the number of deep learning layers and filters. The accuracy of 88% confirmed with 90% of the dataset for training and 10% for tests. In addition, Grad-CAM enabled us to find and blur the location of swear words in negative comments. Although the accuracy of classifying comments based on simple forbidden words was 56%, it was found that blurring negative comments through the deep learning model was more effective.

Classification of Traffic Information Announcement Considering Cognitive Characteristics for Traffic Situations (교통상황별 인지특성을 고려한 교통정보 방송멘트의 분류에 관한 연구)

  • Hwang, Seong-Min;Lee, Byung-Joo;Suh, Seung-Hwan;Sung, Soo-Lyeon;NamGung, Moon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.3
    • /
    • pp.1-11
    • /
    • 2010
  • Traffic broadcasting is using a usual traffic information announcement when giving its information to users on the road and for the provision of information useful to drivers, a clear criteria of how to judge with information from informers needs to be established from the perspective of users. In this study, to give some available criteria for current announcement which often causes confusion, cognitive characteristics were investigated and analyzed based on judgment criteria which are commonly felt by correspondents, participants in traffic broadcasting and drivers. The result requires the provision of information that is relied on an average speed where drivers feel little cognitive difference and found a classification where a smooth traffic flow is more than 60km/h, going slow 40~60km/h and congested state less than 40km/h respectively. And from the study of 35 traffic information announcement for different traffic situations, 8 cases of smooth state and 9 cases of congested state were clearly classified but the rest 18 cases of comment were ambiguously perceived by drivers and which requires the necessity of a announcement that uses directly the word of 'smooth', 'slow', and 'congestion' in the actual expression of slow driving. The future study should be focused on the establishment of more definite criteria by representation of nearly real traffic flow, provision of traffic information announcement and the analysis of cognitive response through car dynamic simulators and the kinds.