DOI QR코드

DOI QR Code

Negative Reasons Classification for Twitter Data on US Airline Services

US 항공사서비스에 대한 트위터 데이터의 부정적 이유 분류

  • Received : 2021.02.13
  • Accepted : 2021.03.31
  • Published : 2021.04.30

Abstract

Many companies try to analyze and utilize feedback on services. This can be used for improving service quality or marketing. Until now, most natural language processing studies have attempted to analyze emotions divided into positive, negative and neutral. However, in this work, specific negative reasons are extracted and classified. The dataset is a standard dataset from kaggle that uses tweet data for U.S. airline services. Tweets categorized as negative are labeled with 10 categories of negative reasons. The dataset was divided into train, validation, and test 8:1:1. The learning and classification process was largely divided into two stages. The first is to convert words and sentences into vector values. It is compared and analyzed using Doc2Vec and BERT (Bidirectional Encoder Representations from Transformers) models for embedding and vectorization. The second is to learn and classify sentences transformed into vectors by matching them with 10 negative reason classes. During this learning process, I converted the negative reason into a sentence and attached it to the back of the original text and made new data.I then used BERT's Next Sentence Prediction technique to allow further learning to be performed. This method was able to improve classification accuracy. For each dataset and classification method, metrics were computed, visualized, and compared.

Keywords

References

  1. A. Rane and A. Kumar, "Sentiment Classification System of Twitter Data for us Airline Service Analysis," in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Vol. 1, pp. 769-773, IEEE, 2018.
  2. Q. Le and T. Mikolov, "Distributed Representations of Sen-tences and Documents," in International conference on ma-chine learning, pp. 1188-1196, PMLR, 2014.
  3. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, "Bert: Pre-training of Deep Bidirectional Transformers for Language Under-standing," arXiv preprint arXiv:1810.04805, 2018.
  4. A. Natekin and A. Knoll, "Gradient Boosting Machines, a Tu-torial," Frontiers in neurorobotics, Vol. 7, pp. 21, 2013. https://doi.org/10.3389/fnbot.2013.00021
  5. D. Chen, J. Bolton, C. D. Manning, "A Thorough Examina-tion of the cnn/daily Mail Reading Comprehension Task," arXiv preprint arXiv:1606.02858, 2016.
  6. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention is all you need," arXiv preprint arXiv:1706.03762, 2017.
  7. Mikolov, T., Chen, K., Corrado, G., Dean, J., "Efficient Estimation of word Representations in Vector Space." arXiv preprint arXiv:1301.3781, 2013.
  8. T. Chen and C. Guestrin, "Xgboost: A Scalable Tree Boosting System," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, 2016.
  9. S. Hochreiter and J. Schmidhuber, "Long Short-term Memory," Neural computation, Vol. 9, No. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735