Negative Reasons Classification for Twitter Data on US Airline Services

Guk, Jeong-Woo;

doi:10.14372/IEMEK.2021.16.2.73

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Volume 16 Issue 2
/
Pages.73-77
/
2021
/
1975-5066(pISSN)

Institute of Embedded Engineering of Korea (대한임베디드공학회)

DOI QR Code

Negative Reasons Classification for Twitter Data on US Airline Services

US 항공사서비스에 대한 트위터 데이터의 부정적 이유 분류

Guk, Jeong-Woo (Yonsei University)

국정우

Received : 2021.02.13
Accepted : 2021.03.31
Published : 2021.04.30

https://doi.org/10.14372/IEMEK.2021.16.2.73 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Many companies try to analyze and utilize feedback on services. This can be used for improving service quality or marketing. Until now, most natural language processing studies have attempted to analyze emotions divided into positive, negative and neutral. However, in this work, specific negative reasons are extracted and classified. The dataset is a standard dataset from kaggle that uses tweet data for U.S. airline services. Tweets categorized as negative are labeled with 10 categories of negative reasons. The dataset was divided into train, validation, and test 8:1:1. The learning and classification process was largely divided into two stages. The first is to convert words and sentences into vector values. It is compared and analyzed using Doc2Vec and BERT (Bidirectional Encoder Representations from Transformers) models for embedding and vectorization. The second is to learn and classify sentences transformed into vectors by matching them with 10 negative reason classes. During this learning process, I converted the negative reason into a sentence and attached it to the back of the original text and made new data.I then used BERT's Next Sentence Prediction technique to allow further learning to be performed. This method was able to improve classification accuracy. For each dataset and classification method, metrics were computed, visualized, and compared.

Keywords

References

A. Rane and A. Kumar, "Sentiment Classification System of Twitter Data for us Airline Service Analysis," in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Vol. 1, pp. 769-773, IEEE, 2018.
Q. Le and T. Mikolov, "Distributed Representations of Sen-tences and Documents," in International conference on ma-chine learning, pp. 1188-1196, PMLR, 2014.
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, "Bert: Pre-training of Deep Bidirectional Transformers for Language Under-standing," arXiv preprint arXiv:1810.04805, 2018.
A. Natekin and A. Knoll, "Gradient Boosting Machines, a Tu-torial," Frontiers in neurorobotics, Vol. 7, pp. 21, 2013. https://doi.org/10.3389/fnbot.2013.00021
D. Chen, J. Bolton, C. D. Manning, "A Thorough Examina-tion of the cnn/daily Mail Reading Comprehension Task," arXiv preprint arXiv:1606.02858, 2016.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention is all you need," arXiv preprint arXiv:1706.03762, 2017.
Mikolov, T., Chen, K., Corrado, G., Dean, J., "Efficient Estimation of word Representations in Vector Space." arXiv preprint arXiv:1301.3781, 2013.
T. Chen and C. Guestrin, "Xgboost: A Scalable Tree Boosting System," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, 2016.
S. Hochreiter and J. Schmidhuber, "Long Short-term Memory," Neural computation, Vol. 9, No. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Negative Reasons Classification for Twitter Data on US Airline Services

US 항공사서비스에 대한 트위터 데이터의 부정적 이유 분류

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)