Exploring the Sentiment Analysis of Electric Vehicles Social Media Data by Using Feature Selection Methods

속성선택방법을 이용한 전기자동차 소셜미디어 데이터의 감성분석 연구

  • Costello, Francis Joseph (SKK Business School, Sungkyunkwan University) ;
  • Lee, Kun Chang (Global Business Administration/Dept of Health Sciences & Technology, SAIHST (Samsung Advanced Institute for Health Sciences & Technology) Sungkyunkwan University)
  • Received : 2019.01.02
  • Accepted : 2020.02.20
  • Published : 2020.02.28


This study presents a recently obtained social media data set based upon the case study of Electric Vehicles (EV) and looks to implement a sentiment analysis (SA) in order to gain insights. This study uses two methods in order to fully analyze the public's sentiment on EVs. First, we implement a SA tool in which we used to extract the sentiment of comments. Next we labeled the data with these sentiments obtained and classified them. While performing classification we found the problem of dimensionality and also explored the use of feature selection (FS) models in order to reduce the data set's dimensionality. We found that the use of three FS models (Chi Squared, Information Gain and ReliefF) showed the most promising results when used alongside a logistic and support vector machines classification algorithm. the contributions of this paper are in providing an real-world example of social media text analytics which can be adopted in many other areas of research and business. Moving forward researchers can use the methodological approach in this paper to further refine and improve their own case uses in text analytics.


  1. X. Tian, Y. Geng, S. Zhong, J. Wilson, C. Gao, W. Chen & H. Hao. (2018). A bibliometric analysis on trends and characters of carbon emissions from transport sector. Transportation Research Part D: Transport and Environment, 59(December 2017) 1-10.
  2. W. He, X. Tian, R. Tao, W. Zhang, G. Yan & V. Akula. (2017). Application of social media analytics: A case of analyzing online hotel reviews. Online Information Review, 41(7), 921-935.
  3. T. Carpenter (2015). Measuring and Mitigating Electric Vehicle Adoption Barriers. PhD thesis, Waterloo, Ontario.
  4. J. Kim, M. Han, Y. Lee & Y. Park. (2016). Futuristic data-driven scenario building: Incorporating text mining and fuzzy association rule mining into fuzzy cognitive map. Expert Systems with Applications, 57, 311-323.
  5. J. Li & H. Liu. (2017). Challenges of Feature Selection for Big Data Analytics. IEEE Computer Society, (March), 9-15.
  6. M. N. Injadat, F. Salo & A. B. Nassif. (2016). Data mining techniques in social media: A survey. Neurocomputing, 214, 654-670.
  7. B. Li, K. C. C. Chan, C. Ou & S. Ruifeng. (2017). Discovering public sentiment in social media for predicting stock movement of publicly listed companies. Information Systems, 69, 81-92.
  8. N. F. F. da Silva, E. R. Hruschka & E. R. Hruschka. (2014). Tweet sentiment analysis with classifier ensembles. Decision Support Systems, 66, 170-179.
  9. H. Yuan, R. Y. K. Lau & W. Xu. (2016). The determinants of crowdfunding success: A semantic text analytics approach. Decision Support Systems, 91.
  10. A. Ortigosa, J. M. Martín & R. M. Carrol. (2014). Sentiment analysis in Facebook and its application to e-learning. Computers in Human Behavior, 31(1), 527-541.
  11. T. W. Rinker. (2018). sentimentr: Calculate Text Polarity Sentiment version 2.6.1. Retrieved from.
  12. C. T. Tran, M. Zhang, P. Andreae, B. Xue & L. T. Bui. (2018). Improving performance of classification on incomplete data using feature selection and clustering. Applied Soft Computing Journal, 73, 848-861.
  13. M. Tutkan, M. C. Ganiz & S. Akyokus. (2016). Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Information Processing and Management, 52(5), 885-910.
  14. K. Seddig, P. Jochem & W. Fichtner. (2017). Integrating renewable energy sources by electric vehicle fleets under uncertainty. Energy, 141, 2145-2153.
  15. M. Neaimeh, S. D. Salisbury, G. A. Hill, P. T. Blythe, D. R. Scoffield & J. E. Francfort. (2017). Analysing the usage and evidencing the importance of fast chargers for the adoption of battery electric vehicles. Energy Policy, 108, 474-486.
  16. D. Connolly. (2017). Economic viability of electric roads compared to oil and batteries for all forms of road transport. EnergyStrategy Reviews.
  17. L. H. Bjornsson & S. Karlsson. (2017). Electrification of the two-car household: PHEV or BEV? Transportation Research Part C: Emerging Technologies, 85(October), 363-376.
  18. I. H. Witten, E. Frank & M. A. Hall. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Burlington, MA: Morgan Kaufmann Publishers Inc.
  19. M. Robnik-Sikonja & I. Kononenko. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53(1), 23-69.
  20. M. A. Hall. (1999). Correlation-based feature selection for machine learning.
  21. R. J. Quinlan. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
  22. G. Wang, J. Sun, J. Ma, K. Xu & J. Gu (2014). Sentiment classification: The contribution of ensemble learning. DecisionSupport Systems, 57, 77-93.
  23. R. Togo, K. Magota, T. Shiga, K. Hirata, I. Tsujino, M. Haseyama & T. Ogawa (2018). Cardiac sarcoidosis classification with deep convolutional neural network-based features using polar maps. Computers in Biology and Medicine, 104(August 2018), 81-86.
  24. A. Onan & S. Korukoglu (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38.
  25. F. Wang, T. Xu, T. Tang, M. Zhou & H. Wang (2017). Bilevel Feature Extraction-Based Text Mining for Fault Diagnosis of Railway Systems. IEEE Transactions on Intelligent Transportation Systems, 18(1), 49-58.
  26. L. M. Abualigah, A. T.Khader, M. A. Al-Betar, & O. A. Alomari. (2017). Text feature selection with a robust weight schemeand dynamic dimension reduction to text document clustering. Expert Systemswith Applications, 84, 24-36.
  27. F. J. Costello & K. C. Lee. (2019). Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending. Journal of Digital Convergence, 17(9), 71-78.
  28. C. Dhaoui, C. M. Webster & L. P. Tan. (2017). Social media sentiment analysis: lexicon versus machine learning. Journal of Consumer Marketing, 34(6), 480-488.