• Title/Summary/Keyword: 텍스트기반 분류

Search Result 354, Processing Time 0.024 seconds

Generative AI based Emotion Analysis of Consumer Reviews Using the Emotion Wheel (생성 AI 기반 감정 수레바퀴 모델을 활용한 사용자 리뷰 감정 분석)

  • Yu Rim Park;Hyon Hee Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.1204-1205
    • /
    • 2023
  • 본 논문은 소비자의 리뷰 데이터를 기반으로 한 새로운 감성 분석 방법을 제안한다. 긍정, 부정, 중립으로 분류하는 전통적 감성 분석방법은 텍스트에 나타난 감정의 섬세한 차이를 파악하기 어렵다. 이에 본 연구에서는 GPT 모델을 사용하여 텍스트에서 사용자의 감정을 8 가지의 카테고리로 세분화한다. 부정적 정서를 가진 리뷰에서 분노, 혐오, 실망과 같은 구체적인 감정들을 직관적으로 파악할 수 있었고, 감정의 강도까지 파악할 수 있었다. 제안된 방법을 통해 기업은 고객의 요구 사항을 정확하게 인지할 수 있으며, 고객 맞춤형 서비스 개선에 기여할 수 있다는 점이 기대된다.

Multimodal Media Content Classification using Keyword Weighting for Recommendation (추천을 위한 키워드 가중치를 이용한 멀티모달 미디어 콘텐츠 분류)

  • Kang, Ji-Soo;Baek, Ji-Won;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.5
    • /
    • pp.1-6
    • /
    • 2019
  • As the mobile market expands, a variety of platforms are available to provide multimodal media content. Multimodal media content contains heterogeneous data, accordingly, user requires much time and effort to select preferred content. Therefore, in this paper we propose multimodal media content classification using keyword weighting for recommendation. The proposed method extracts keyword that best represent contents through keyword weighting in text data of multimodal media contents. Based on the extracted data, genre class with subclass are generated and classify appropriate multimodal media contents. In addition, the user's preference evaluation is performed for personalized recommendation, and multimodal content is recommended based on the result of the user's content preference analysis. The performance evaluation verifies that it is superiority of recommendation results through the accuracy and satisfaction. The recommendation accuracy is 74.62% and the satisfaction rate is 69.1%, because it is recommended considering the user's favorite the keyword as well as the genre.

A Spam Mail Classification Using Link Structure Analysis (링크구조분석을 이용한 스팸메일 분류)

  • Rhee, Shin-Young;Khil, A-Ra;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.30-39
    • /
    • 2007
  • The existing content-based spam mail filtering algorithms have difficulties in filtering spam mails when e-mails contain images but little text. In this thesis we propose an efficient spam mail classification algorithm that utilizes the link structure of e-mails. We compute the number of hyperlinks in an e-mail and the in-link frequencies of the web pages hyperlinked in the e-mail. Using these two features we classify spam mails and legitimate mails based on the decision tree trained for spam mail classification. We also suggest a hybrid system combining three different algorithms by majority voting: the link structure analysis algorithm, a modified link structure analysis algorithm, in which only the host part of the hyperlinked pages of an e-mail is used for link structure analysis, and the content-based method using SVM (support vector machines). The experimental results show that the link structure analysis algorithm slightly outperforms the existing content-based method with the accuracy of 94.8%. Moreover, the hybrid system achieves the accuracy of 97.0%, which is a significant performance improvement over the existing method.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Development of a Hypertext-based Polychotomous Key for the Identification of Planthoppers Caught by Light Trap in Paddy Fields (논에 설치한 유아등에 채집되는 멸구류 동정을 위한 하이퍼텍스트 기반 검색표 개발)

  • 김황용;박창규;한만위;엄기백;우건석
    • Korean journal of applied entomology
    • /
    • v.41 no.2
    • /
    • pp.75-83
    • /
    • 2002
  • The hypertext-based polychotomous key in m (World Wide Web) was developed to improve the identification accuracy of planthoppers caught by the light trap in Korean paddy fields. The effects of it were tested by 12 students who are not familiar at the identification of insects. When they used the hypertext, it was improved that the ability of them to recognize Sogatella furcifera (Horvath) and Laodelphax striatellus (Fallen). Identification accuracy of the former was increased significantly from 56% to 83% and that of the latter was also increased significantly from 47% to 80%. However, many students still have difficulty in the recognition of Nilaparvata lugens (Stal).

Function Prediction of Gene products by Term based Probabilistic Model (단어 기반의 확률 모델을 이용한 단백질 기능 예측)

  • Park, Dae-Won;Kwon, Hyuk-Chul
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.73-78
    • /
    • 2003
  • 유전 연구를 통해 밝혀지고 있는 단백질은 각각의 기능적 특성을 가지고 서로 영향을 주고받으며 상호 작용한다. 단백질의 기능적 특성은 생물체에서는 단백질이 나타내는 기능으로 단백질 이름은 이들 단백질의 기능을 정확히 나타낼 수 있도록 붙여진다. 기능적 특성에 의해 명명된 단백질은 단백질을 구성하는 단어도 단백질과 유사한 기능 특성을 가질 가능성이 높다. 이는 텍스트 기반의 연구에서 단어가 가지는 중요성에서 비롯된다. 본 논문에서는 단백질을 구성하는 단어들을 단백질의 기능적 특성으로 분류하고, 이 기능분포에 의해서 단백질의 기능을 역으로 예측하고 판단하고자 하였다.

  • PDF

PDA 기반의 Mobile Commerce서비스

  • 김완식
    • Proceedings of the CALSEC Conference
    • /
    • 2002.01a
    • /
    • pp.362-366
    • /
    • 2002
  • 일반적 정의 :"온라인 네트워크를 통해 이뤄지는 모든 형태의 거래" OECD(1997):"전자상거래는 일반적으로 개인과 조직 모두를 포함해 텍스트, 음성 화상을 포함한 디지털데이터의 처리와 전송에 기초한 상업활동과 관련된 모든 종류의 거래" 경제주체에 따른 EC의 분류 : 기업 대 기업(Business to Business), 기업 대 소비자(Business to Consumer), 소비자 대 소비자(Consumer to Consumer), 정부 대 기업(Government to Business), 정부 대 소비자(Government to Consumer), 기업 대 딜러간(Business to Dealer), 인터네 비즈니스 사이트 대 사이트(Site to Site) Mobile Commerce의 정의 일반적 정의 : 휴대폰, PDA, 노트북 등의 개인 휴대 단말기와 무선 통신네트웍을 기반으로 한 재화(Goods), 용역(Service), 정보(Information) 및 디지털 컨텐츠 등의 모든 전자적 거래(중략)

  • PDF

Information Extraction Using the Ontology (온톨로지를 이용한 정보 추출)

  • Kim, In-Su;Lee, Bog-Ju
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.652-654
    • /
    • 2005
  • 정보 추출은 텍스트로 되어 있는 비 정형화된 데이터로부터 정형화된 정보를 추출하는 분야이다. 기존의 정보 추출이 구문 중심의 방법인데 비해 본 논문에서는 시맨틱 웹과 온톨로지를 이용한 의미 기반의 정보 추출을 시도한다. 또한 본 논문에서는 기존의 정보 추출 모델을 분류해 보고 반자동 정보 추출이라는 새로운 모델을 제시한다. 이 모델에 기반하여 개인 정보를 자동으로 정형화 시켜주는 정보 추출 도구를 개발하고 이를 소개한다.

  • PDF

A Distributed Processing Model for Automatic Classification of Text Documents based Personalized Information Using RTI (RTI 통신을 이용한 개인환경기반 자동문서 분산처리 기술)

  • In, Joo-Ho;Kim, Myung-Kyu;Chae, Soo-Hoan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06d
    • /
    • pp.365-369
    • /
    • 2007
  • 인터넷이 폭 넓게 보급되어 온라인 상에서 얻을 수 있는 텍스트 정보의 양이 급증함에 따라 산재해 있는 문서들에 대한 효과적인 정보 관리 및 검색이 요구되고 있다. 자동 문서분류란 문서의 내용에 기반하여 미리 정의되어 있는 범주에 문서를 자동으로 할당하는 작업으로써 효율적인 정보 관리 및 검색을 가능하게 한다. 하지만 자동문서 분류를 하기 위해서는 방대한 양의 데이터를 수집 보관하기 위한 분산 환경이 반드시 필요하다. 본 논문에서는 자동 문서분류를 위한 분산기반 환경의 조성에 있어서 RTI(Run Time Infrastructure)를 통한 분산 시스템 환경으로 구성하였다.

  • PDF

Methodology for Classifying Hierarchical Data Using Autoencoder-based Deeply Supervised Network (오토인코더 기반 심층 지도 네트워크를 활용한 계층형 데이터 분류 방법론)

  • Kim, Younha;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.185-207
    • /
    • 2022
  • Recently, with the development of deep learning technology, researches to apply a deep learning algorithm to analyze unstructured data such as text and images are being actively conducted. Text classification has been studied for a long time in academia and industry, and various attempts are being performed to utilize data characteristics to improve classification performance. In particular, a hierarchical relationship of labels has been utilized for hierarchical classification. However, the top-down approach mainly used for hierarchical classification has a limitation that misclassification at a higher level blocks the opportunity for correct classification at a lower level. Therefore, in this study, we propose a methodology for classifying hierarchical data using the autoencoder-based deeply supervised network that high-level classification does not block the low-level classification while considering the hierarchical relationship of labels. The proposed methodology adds a main classifier that predicts a low-level label to the autoencoder's latent variable and an auxiliary classifier that predicts a high-level label to the hidden layer of the autoencoder. As a result of experiments on 22,512 academic papers to evaluate the performance of the proposed methodology, it was confirmed that the proposed model showed superior classification accuracy and F1-score compared to the traditional supervised autoencoder and DNN model.