Exploring an Optimal Feature Selection Method for Effective Opinion Mining Tasks

Eo, Kyun Sun;Lee, Kun Chang;

doi:10.9708/jksci.2019.24.02.171

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

제24권2호
/
Pages.171-177
/
2019
/
1598-849X(pISSN)
/
2383-9945(eISSN)

한국컴퓨터정보학회 (Korean Society of Computer Information)

DOI QR Code

Exploring an Optimal Feature Selection Method for Effective Opinion Mining Tasks

Eo, Kyun Sun (SKKU Business School, Sungkyunkwan University) ;
Lee, Kun Chang (SKKU Business School/SAIHST (Samsung Advanced Institute of Health Sciences & Technology), Sungkyunkwan University)

투고 : 2018.11.13
심사 : 2019.01.07
발행 : 2019.02.28

https://doi.org/10.9708/jksci.2019.24.02.171 인용 PDF KSCI HTML

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

This paper aims to find the most effective feature selection method for the sake of opinion mining tasks. Basically, opinion mining tasks belong to sentiment analysis, which is to categorize opinions of the online texts into positive and negative from a text mining point of view. By using the five product groups dataset such as apparel, books, DVDs, electronics, and kitchen, TF-IDF and Bag-of-Words(BOW) fare calculated to form the product review feature sets. Next, we applied the feature selection methods to see which method reveals most robust results. The results show that the stacking classifier based on those features out of applying Information Gain feature selection method yields best result.

키워드

Table 1. Study of Opinion mining

CPTSCQ_2019_v24n2_171_t0001.png 이미지

Table 2. Results of Accuracy

CPTSCQ_2019_v24n2_171_t0002.png 이미지

Table 4. Results of AUC

CPTSCQ_2019_v24n2_171_t0003.png 이미지

Table 3. The number of features

CPTSCQ_2019_v24n2_171_t0004.png 이미지

Table 5. Results of T-test

CPTSCQ_2019_v24n2_171_t0005.png 이미지

참고문헌

A. Yadollahi, A. G. Shahraki, & O. R. Zaiane, "Current state of text sentiment analysis from opinion to emotion mining". Association for computing machinery computing surveys, Vol. 50, No. 2, Article 25, 2017.
M. V. Mantyla, D. Graziotin, & M. Kuutila, "The evolution of sentiment analysis? A review of research topics, venues, and top cited papers", Computer Science Review, Vol. 27, pp. 16-32, 2018. https://doi.org/10.1016/j.cosrev.2017.10.002
C. Catal, & M. Nangir, "A sentiment classification model based on multiple classifiers". Applied Soft Computing, Vol. 50, pp. 135-141, 2017. https://doi.org/10.1016/j.asoc.2016.11.022
M. Kang, J. Ahn, & K. Lee, "Opinion mining using ensemble text hidden Markov models for text classification". Expert Systems with Applications, Vol. 94, pp. 218-227, 2018. https://doi.org/10.1016/j.eswa.2017.07.019
Z. Li, W. Xu, L. Zhang, & R. Y. Lau, "An ontology-based Web mining method for unemployment rate prediction", Decision Support Systems, Vol. 66, pp. 114-122, 2014. https://doi.org/10.1016/j.dss.2014.06.007
M. Ghiassi, J. Skinner, & D. Zimbra, "Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network.", Expert systems with applications, Vol. 40, No. 16, pp. 6266-6282, 2013. https://doi.org/10.1016/j.eswa.2013.05.057
N. F. Da Silva, E. R. Hruschka, & E. R. Hruschka, "Tweet sentiment analysis with classifier ensembles." Decision support systems, Vol. 66, pp. 170-179, 2014. https://doi.org/10.1016/j.dss.2014.07.003
G. Wang, J. Sun, J. Ma, K. Xu, & J. Gu, "Sentiment classification: The contribution of ensemble learning.", Decision support systems, Vol. 57, pp. 77-93, 2014. https://doi.org/10.1016/j.dss.2013.08.002
Y. Liu, J. W. Bi, & Z. P. Fan, "Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms.", Expert systems with applications, Vol. 80, pp. 323-339, 2017. https://doi.org/10.1016/j.eswa.2017.03.042
M. A. Hall, "Correlation-based feature selection for machine learning", 1999.
M. Robnik-Sikonja, & I. Kononenko, "Theoretical and empirical analysis of ReliefF and RReliefF". Machine learning, Vol. 53, No. (1-2), pp. 23-69, 2003. https://doi.org/10.1023/A:1025667309714
S. Menard, "Applied logistic regression analysis, Vol. 106, Sage", 2002.
W. L. Buntine, "Operations for learning with graphical models". Journal of Atificial Intelligence Research, Vol. 2, pp. 159-225, 1994. https://doi.org/10.1613/jair.62
M. Ballings, D. Van den Poel, N. Hespeels, & R. Gryp, "Evaluating multiple classifiers for stock price direction prediction". Expert Systems with Applications, Vo. 42, No. 20, pp. 7046-7056, 2015. https://doi.org/10.1016/j.eswa.2015.05.013
L. Breiman, "Random forests. Machine learning, Vol. 45, No. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
T.K. Ho, "The Random Subspace Method for Constructing Decision Forests," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, 1998. https://doi.org/10.1109/34.709601
S. K. Murthy, "Automatic construction of decision trees from data: A multi-disciplinary survey". Data mining and knowledge discovery, Vol. 2, No. 4, pp. 345-389, 1998. https://doi.org/10.1023/A:1009744630224
V. Vapnik, "The nature of statistical learning theory. Springer science & business media", 2013.
L. Breiman, "Bagging predictors". Machine learning, Vol. 24, No. 2, pp. 123-140, 1996. https://doi.org/10.1023/A:1018054314350
D. H. Wolpert, "Stacked generalization". Neural networks, Vol. 5, No. 2, pp. 241-259, 1992. https://doi.org/10.1016/S0893-6080(05)80023-1
J. Blitzer, M. Dredze, & F. Pereira, "Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification". In Proceedings of the 45th annual meeting of the association of computational linguistics pp. 440-447, 2007.
A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms", Pattern recognition, Vol. 30, No. 7, pp. 1145-1159, 1997. https://doi.org/10.1016/S0031-3203(96)00142-2
S. Arlot, & A. Celisse, "A survey of cross-validation procedures for model selection", Statistics surveys, Vol. 4, pp. 40-79, 2010. https://doi.org/10.1214/09-SS054

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

Exploring an Optimal Feature Selection Method for Effective Opinion Mining Tasks

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)