DOI QR코드

DOI QR Code

Slangs and Short forms of Malay Twitter Sentiment Analysis using Supervised Machine Learning

  • Yin, Cheng Jet (Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka (UTeM)) ;
  • Ayop, Zakiah (Information Security Forensics and Computer Networking (INSFORNET)) ;
  • Anawar, Syarulnaziah (Information Security Forensics and Computer Networking (INSFORNET)) ;
  • Othman, Nur Fadzilah (Information Security Forensics and Computer Networking (INSFORNET)) ;
  • Zainudin, Norulzahrah Mohd (Jabatan Sains Komputer, Fakulti Sains dan Teknologi Pertahanan, Universiti Pertahanan Nasional Malaysia (UPNM))
  • Received : 2021.11.05
  • Published : 2021.11.30

Abstract

The current society relies upon social media on an everyday basis, which contributes to finding which of the following supervised machine learning algorithms used in sentiment analysis have higher accuracy in detecting Malay internet slang and short forms which can be offensive to a person. This paper is to determine which of the algorithms chosen in supervised machine learning with higher accuracy in detecting internet slang and short forms. To analyze the results of the supervised machine learning classifiers, we have chosen two types of datasets, one is political topic-based, and another same set but is mixed with 50 tweets per targeted keyword. The datasets are then manually labelled positive and negative, before separating the 275 tweets into training and testing sets. Naïve Bayes and Random Forest classifiers are then analyzed and evaluated from their performances. Our experiment results show that Random Forest is a better classifier compared to Naïve Bayes.

Keywords

Acknowledgement

This publication has been supported by Center of Research and Innovation Management (CRIM), Universiti Teknikal Malysia Melaka (UTeM). The authors would like to thank UTeM and INSFORNET research group members for their supports.

References

  1. J. Duribe, "Here's what being ratioed on Twitter actually means - PopBuzz," 04-Nov-2020. [Online]. Available: https://www.popbuzz.com/internet/social-media/ratioed-meaning-twitter/. [Accessed: 12-Oct-2021].
  2. L. Mahan, "Youthsplaining: Everything You Need to Know About Cancel Culture - InsideHook," 20-Aug-2019. [Online]. Available: https://www.insidehook.com/article/internet/youthsplaining-everything-you-need-to-know-about-cancel-culture. [Accessed: 12-Oct-2021].
  3. H. Rosa et al., "Automatic cyberbullying detection: A systematic review," Comput. Human Behav., vol. 93, pp. 333-345, 2019. https://doi.org/10.1016/j.chb.2018.12.021
  4. N. I. Zabha, Z. Ayop, S. Anawar, E. Hamid, and Z. Z. Abidin, "Developing cross-lingual sentiment analysis of Malay Twitter data using lexicon-based approach," Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 1, 2019, doi: 10.14569/IJACSA.2019.0100146.
  5. Z. Z. Izazi and T. M. Tengku-Sepora, "Slangs on Social Media: Variations among Malay Language Users on Twitter.," Pertanika J. Soc. Sci. ¥& Humanit., vol. 28, no. 1, 2020.
  6. "ABBREVIATION | meaning in the Cambridge English Dictionary." [Online]. Available: https://dictionary.cambridge.org/dictionary/english/abbreviation. [Accessed: 15-Oct-2021].
  7. J. Sultan, "Developing an Automated Machine Learning Based Sentiment Analysis for Afaan Oromoo," ASTU, 2021.
  8. A. Reddy, D. N. Vasundhara, and P. Subhash, "Sentiment Research on Twitter Data," Int. J. Recent Technol. Eng., vol. 8, pp. 1068-1070, 2019.
  9. G. Zammarchi, F. Mola, and C. Conversano, "Impact of the COVID-19 outbreak on Italy's country reputation and stock market performance: a sentiment analysis approach," arXiv Prepr. arXiv2103.13871, 2021.
  10. V. S. Lakshmi, K. Janan, J. P. S. Joshua, and M. Sharoz, "Predicting Supervised Machine Learning Performances for Sentiment Analysis Using Contextual Based Approaches," in Journal of Physics: Conference Series, 2021, vol. 1916, no. 1, p. 12117. https://doi.org/10.1088/1742-6596/1916/1/012117
  11. Q. Li, S. Shah, R. Fang, A. Nourbakhsh, and X. Liu, "Tweet sentiment analysis by incorporating sentiment-specific word embedding and weighted text features," in 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2016, pp. 568-571.
  12. R. Batra, Z. Kastrati, A. S. Imran, S. M. Daudpota, and A. Ghafoor, "A Large Scale Tweet Dataset for Urdu Text Sentiment Analysis," Mendeley Data, vol. 1, 2020, doi: 10.17632/RZ3XG97RM5.1.
  13. S. V Praveen, R. Ittamalla, and G. Deepak, "Analyzing the attitude of Indian citizens towards COVID-19 vaccine--A text analytics study," Diabetes ¥& Metab. Syndr. Clin. Res. ¥& Rev., vol. 15, no. 2, pp. 595-599, 2021. https://doi.org/10.1016/j.dsx.2021.02.031
  14. S. Almatarneh and P. Gamallo, "Automatic construction of domain-specific sentiment lexicons for polarity classification," in International Conference on Practical Applications of Agents and Multi-Agent Systems, 2017, pp. 175-182.
  15. N. R. Bhowmik, M. Arifuzzaman, M. R. H. Mondal, and M. S. Islam, "Bangla Text Sentiment Analysis Using Supervised Machine Learning with Extended Lexicon Dictionary," Nat. Lang. Process. Res., vol. 1, no. 3-4, pp. 34-45, 2021. https://doi.org/10.2991/nlpr.d.210316.001
  16. A. Messaoudi, H. Haddad, M. Ben HajHmida, C. Fourati, and A. Ben Hamida, "Learning word representations for tunisian sentiment analysis," in Mediterranean Conference on Pattern Recognition and Artificial Intelligence, 2020, pp. 329-340.
  17. S. A. El Rahman, F. A. AlOtaibi, and W. A. AlShehri, "Sentiment analysis of twitter data," in 2019 International Conference on Computer and Information Sciences (ICCIS), 2019, pp. 1-4.
  18. H. Zolkepli, "Twitter Political Sentiment in Bahasa | Kaggle," vol. 1. 11-Apr-2018.
  19. "Understanding These Weird Malay Code on Message World - EverydayOnSales.com News." [Online]. Available: https://www.everydayonsales.com/news/understanding-these-weird-malay-code-on-message-world. [Accessed: 19-Oct-2021].
  20. "Malay Slang Wiki | Fandom." [Online]. Available: https://malayslang.fandom.com/wiki/Malay_Slang_Wiki. [Accessed: 19-Oct-2021].
  21. W. Ali, "Phishing website detection based on supervised machine learning with wrapper features selection," Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 9, pp. 72-78, 2017.
  22. A. Chakure, "Random Forest Regression. In this blog we'll try to understand... | by Afroz Chakure | The Startup | Medium," 29-Jun-2019. [Online]. Available: https://medium.com/swlh/random-forest-and-its-implementation-71824ced454f. [Accessed: 19-Oct-2021].
  23. A. Gupte, S. Joshi, P. Gadgul, A. Kadam, and A. Gupte, "Comparative study of classification algorithms used in sentiment analysis," Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 5, pp. 6261-6264, 2014.
  24. H. Elzayady, K. M. Badran, and G. I. Salama, "Sentiment Analysis on Twitter Data using Apache Spark Framework," in 2018 13th International Conference on Computer Engineering and Systems (ICCES), 2018, pp. 171-176.