• Title/Summary/Keyword: Sentiment Feature Extraction

Search Result 16, Processing Time 0.029 seconds

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

A Korean Sentence and Document Sentiment Classification System Using Sentiment Features (감정 자질을 이용한 한국어 문장 및 문서 감정 분류 시스템)

  • Hwang, Jaw-Won;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.3
    • /
    • pp.336-340
    • /
    • 2008
  • Sentiment classification is a recent subdiscipline of text classification, which is concerned not with the topic but with opinion. In this paper, we present a Korean sentence and document classification system using effective sentiment features. Korean sentiment classification starts from constructing effective sentiment feature sets for positive and negative. The synonym information of a English word thesaurus is used to extract effective sentiment features and then the extracted English sentiment features are translated in Korean features by English-Korean dictionary. A sentence or a document is represented by using the extracted sentiment features and is classified and evaluated by SVM(Support Vector Machine).

Research on Chinese Microblog Sentiment Classification Based on TextCNN-BiLSTM Model

  • Haiqin Tang;Ruirui Zhang
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.842-857
    • /
    • 2023
  • Currently, most sentiment classification models on microblogging platforms analyze sentence parts of speech and emoticons without comprehending users' emotional inclinations and grasping moral nuances. This study proposes a hybrid sentiment analysis model. Given the distinct nature of microblog comments, the model employs a combined stop-word list and word2vec for word vectorization. To mitigate local information loss, the TextCNN model, devoid of pooling layers, is employed for local feature extraction, while BiLSTM is utilized for contextual feature extraction in deep learning. Subsequently, microblog comment sentiments are categorized using a classification layer. Given the binary classification task at the output layer and the numerous hidden layers within BiLSTM, the Tanh activation function is adopted in this model. Experimental findings demonstrate that the enhanced TextCNN-BiLSTM model attains a precision of 94.75%. This represents a 1.21%, 1.25%, and 1.25% enhancement in precision, recall, and F1 values, respectively, in comparison to the individual deep learning models TextCNN. Furthermore, it outperforms BiLSTM by 0.78%, 0.9%, and 0.9% in precision, recall, and F1 values.

A Korean Emotion Features Extraction Method and Their Availability Evaluation for Sentiment Classification (감정 분류를 위한 한국어 감정 자질 추출 기법과 감정 자질의 유용성 평가)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.499-517
    • /
    • 2008
  • In this paper, we propose an effective emotion feature extraction method for Korean and evaluate their availability in sentiment classification. Korean emotion features are expanded from several representative emotion words and they play an important role in building in an effective sentiment classification system. Firstly, synonym information of English word thesaurus is used to extract effective emotion features and then the extracted English emotion features are translated into Korean. To evaluate the extracted Korean emotion features, we represent each document using the extracted features and classify it using SVM(Support Vector Machine). In experimental results, the sentiment classification system using the extracted Korean emotion features obtained more improved performance(14.1%) than the system using content-words based features which have generally used in common text classification systems.

  • PDF

Intensified Sentiment Analysis of Customer Product Reviews Using Acoustic and Textual Features

  • Govindaraj, Sureshkumar;Gopalakrishnan, Kumaravelan
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.494-501
    • /
    • 2016
  • Sentiment analysis incorporates natural language processing and artificial intelligence and has evolved as an important research area. Sentiment analysis on product reviews has been used in widespread applications to improve customer retention and business processes. In this paper, we propose a method for performing an intensified sentiment analysis on customer product reviews. The method involves the extraction of two feature sets from each of the given customer product reviews, a set of acoustic features (representing emotions) and a set of lexical features (representing sentiments). These sets are then combined and used in a supervised classifier to predict the sentiments of customers. We use an audio speech dataset prepared from Amazon product reviews and downloaded from the YouTube portal for the purposes of our experimental evaluations.

Term Frequency-Inverse Document Frequency (TF-IDF) Technique Using Principal Component Analysis (PCA) with Naive Bayes Classification

  • J.Uma;K.Prabha
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.4
    • /
    • pp.113-118
    • /
    • 2024
  • Pursuance Sentiment Analysis on Twitter is difficult then performance it's used for great review. The present be for the reason to the tweet is extremely small with mostly contain slang, emoticon, and hash tag with other tweet words. A feature extraction stands every technique concerning structure and aspect point beginning particular tweets. The subdivision in a aspect vector is an integer that has a commitment on ascribing a supposition class to a tweet. The cycle of feature extraction is to eradicate the exact quality to get better the accurateness of the classifications models. In this manuscript we proposed Term Frequency-Inverse Document Frequency (TF-IDF) method is to secure Principal Component Analysis (PCA) with Naïve Bayes Classifiers. As the classifications process, the work proposed can produce different aspects from wildly valued feature commencing a Twitter dataset.

A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis (영어 트위터 감성 분석을 위한 SentiWordNet 활용 기법 비교)

  • Kang, In-Su
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.4
    • /
    • pp.317-324
    • /
    • 2013
  • Twitter sentiment analysis is to classify a tweet (message) into positive and negative sentiment class. This study deals with SentiWordNet(SWN)-based twitter sentiment analysis. SWN is a sentiment dictionary in which each sense of an English word has a positive and negative sentimental strength. There has been a variety of SWN-based sentiment feature extraction methods which typically first determine the sentiment orientation (SO) of a term in a document and then decide SO of the document from such terms' SO values. For example, for SO of a term, some calculated the maximum or average of sentiment scores of its senses, and others computed the average of the difference of positive and negative sentiment scores. For SO of a document, many researchers employ the maximum or average of terms' SO values. In addition, the above procedure may be applied to the whole set (adjective, adverb, noun, and verb) of parts-of-speech or its subset. This work provides a comparative study on SWN-based sentiment feature extraction schemes with performance evaluation on a well-known twitter dataset.

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

  • Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.818-833
    • /
    • 2021
  • To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.

RESEARCH ON SENTIMENT ANALYSIS METHOD BASED ON WEIBO COMMENTS

  • Li, Zhong-Shi;He, Lin;Guo, Wei-Jie;Jin, Zhe-Zhi
    • East Asian mathematical journal
    • /
    • v.37 no.5
    • /
    • pp.599-612
    • /
    • 2021
  • In China, Weibo is one of the social platforms with more users. It has the characteristics of fast information transmission and wide coverage. People can comment on a certain event on Weibo to express their emotions and attitudes. Judging the emotional tendency of users' comments is not only beneficial to the monitoring of the management department, but also has very high application value for rumor suppression, public opinion guidance, and marketing. This paper proposes a two-input Adaboost model based on TextCNN and BiLSTM. Use the TextCNN model that can perform local feature extraction and the BiLSTM model that can perform global feature extraction to process comment data in parallel. Finally, the classification results of the two models are fused through the improved Adaboost algorithm to improve the accuracy of text classification.

Extracting Core Events Based on Timeline and Retweet Analysis in Twitter Corpus (트위터 문서에서 시간 및 리트윗 분석을 통한 핵심 사건 추출)

  • Tsolmon, Bayar;Lee, Kyung-Soon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.1
    • /
    • pp.69-74
    • /
    • 2012
  • Many internet users attempt to focus on the issues which have posted on social network services in a very short time. When some social big issue or event occurred, it will affect the number of comments and retweet on that day in twitter. In this paper, we propose the method of extracting core events based on timeline analysis, sentiment feature and retweet information in twitter data. To validate our method, we have compared the methods using only the frequency of words, word frequency with sentiment analysis, using only chi-square method and using sentiment analysis with chi-square method. For justification of the proposed approach, we have evaluated accuracy of correct answers in top 10 results. The proposed method achieved 94.9% performance. The experimental results show that the proposed method is effective for extracting core events in twitter corpus.