DOI QR코드

DOI QR Code

Application of Topic Modeling Techniques in Arabic Content: A Systematic Review

  • Maram Alhmiyani (Umm Al-Qura University, College of Computer and Information System) ;
  • Huda Alhazmi (Umm Al-Qura University, College of Computer and Information System)
  • 투고 : 2023.06.05
  • 발행 : 2023.06.30

초록

With the rapid increase of user generated data on digital platforms, the task of categorizing and classifying theses huge data has become difficult. Topic modeling is an unsupervised machine learning technique that can be used to get a summary from a large collection of documents. Topic modeling has been widely used in English content, yet the application of topic modeling in Arabic language is limited. Therefore, the aim of this paper is to provide a systematic review of the application of topic modeling algorithms in Arabic content. Using a well-known and trusted databases including ScienceDirect, IEEE Xplore, Springer Link, and Google Scholar. Considering the publication date from 2012 to 2022, we got 60 papers. After refining the papers based on predefined criteria, we resulted in 32 papers. Our result show that unfortunately the application of topic modeling techniques in Arabic content is limited.

키워드

참고문헌

  1. I. Vayansky and S. A. Kumar, "A review of topic modeling methods," Information Systems, vol. 94, p. 101582, 2020.
  2. B. V. Barde and A. M. Bainwad, "An overview of topic modeling methods and tools," in 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), 2017: IEEE, pp. 745-750.
  3. S. Basabain, "A survey of Arabic thematic sentiment analysis based on topic modeling," International Journal of Computer Science & Network Security, vol. 21, no. 9, pp. 155-162, 2021.
  4. H. Jelodar et al., "Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey," Multimedia Tools and Applications, vol. 78, pp. 15169-15211, 2019. https://doi.org/10.1007/s11042-018-6894-4
  5. J. Qiang, Z. Qian, Y. Li, Y. Yuan, and X. Wu, "Short text topic modeling techniques, applications, and performance: a survey," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 3, pp. 1427-1445, 2020. https://doi.org/10.1109/TKDE.2020.2992485
  6. R. Alghamdi and K. Alfalqi, "A survey of topic modeling in text mining," Int. J. Adv. Comput. Sci. Appl.(IJACSA), vol. 6, no. 1, 2015.
  7. M. Alhawarat, M. Hegazi, and A. Hilal, "Processing the text of the Holy Quran: a text mining study," International Journal of Advanced Computer Science and Applications, vol. 6, no. 2, pp. 262-267, 2015. https://doi.org/10.14569/IJACSA.2015.060237
  8. A. Rafea and N. A. GabAllah, "Topic detection approaches in identifying topics and events from Arabic corpora," Procedia computer science, vol. 142, pp. 270-277, 2018. https://doi.org/10.1016/j.procs.2018.10.492
  9. M. A. Siddiqui, S. M. Faraz, and S. A. Sattar, "Discovering the thematic structure of the Quran using probabilistic topic model," in 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, 2013: IEEE, pp. 234-239.
  10. M. Alhawarat, "Extracting topics from the holy Quran using generative models," International Journal of Advanced Computer Science and Applications, vol. 6, no. 12, pp. 288-294, 2015. https://doi.org/10.14569/IJACSA.2015.061238
  11. M. H. Panju, "Statistical extraction and visualization of topics in the qur'an corpus," Student. Math. Uwaterloo. Ca, 2014.
  12. M. Alshammeri, E. Atwell, and M. A. Alsalka, "Quranic topic modelling using paragraph vectors," in Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2, 2021: Springer, pp. 218-230.
  13. I. El Bazi and N. Laachfoubi, "Arabic named entity recognition using topic modeling," context, vol. 230, 2017.
  14. B. Bansal and S. Srivastava, "On predicting elections with hybrid topic based sentiment analysis of tweets," Procedia Computer Science, vol. 135, pp. 346-353, 2018. https://doi.org/10.1016/j.procs.2018.08.183
  15. R. Alshalan, H. Al-Khalifa, D. Alsaeed, H. Al-Baity, and S. Alshalan, "Detection of hate speech in covid19-related tweets in the arab region: Deep learning and topic modeling approach," Journal of Medical Internet Research, vol. 22, no. 12, p. e22609, 2020.
  16. N. Alsaedi, P. Burnap, and O. Rana, "Sensing real-world events using Arabic Twitter posts," in Proceedings of the International AAAI Conference on Web and Social Media, 2016, vol. 10, no. 1, pp. 515-518.
  17. F. Saidi, Z. Trabelsi, and E. Thangaraj, "A novel framework for semantic classification of cyber terrorist communities on Twitter," Engineering Applications of Artificial Intelligence, vol. 115, p. 105271, 2022.
  18. M. Bekkali and A. Lachkar, "Arabic sentiment analysis based on topic modeling," in Proceedings of the New Challenges in Data Sciences: Acts of the Second Conference of the Moroccan Classification Society, 2019, pp. 1-6.
  19. N. Habbat, H. Anoun, and L. Hassouni, "Topic Modeling and Sentiment Analysis with LDA and NMF on Moroccan Tweets," in Innovations in Smart Cities Applications Volume 4: The Proceedings of the 5th International Conference on Smart City Applications, 2021: Springer, pp. 147-161.
  20. T. Zarra, R. Chiheb, R. Moumen, R. Faizi, and A. E. Afia, "Topic and sentiment model applied to the colloquial Arabic: a case study of Maghrebi Arabic," in Proceedings of the 2017 international conference on smart digital environment, 2017, pp. 174-181.
  21. M. Hankar, M. Birjali, A. El-Ansari, and A. Beni-Hssane, "Arabic Topic Modeling-Based Sentiment Analysis on COVID-19 Feedback Comments," in Advances in Information, Communication and Cybersecurity: Proceedings of ICI2C'21, 2022: Springer, pp. 87-95.
  22. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.
  23. A. Amara, M. A. Hadj Taieb, and M. Ben Aouicha, "Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis," Applied Intelligence, vol. 51, pp. 3052-3073, 2021. https://doi.org/10.1007/s10489-020-02033-3
  24. A. Fuad and M. Al-Yahya, "Analysis and classification of mobile apps using topic modeling: A case study on Google Play Arabic apps," Complexity, vol. 2021, pp. 1-12, 2021.
  25. M. Daoud and D. Daoud, "Sentimental event detection from Arabic tweets," International Journal of Business Intelligence and Data Mining, vol. 17, no. 4, pp. 471-492, 2020. https://doi.org/10.1504/IJBIDM.2020.110378
  26. G. Adel and Y. Wang, "Detecting and Classifying Humanitarian Crisis in Arabic Tweets," in 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD), 2020: IEEE, pp. 269-274.
  27. R. Baly et al., "Comparative evaluation of sentiment analysis methods across Arabic dialects," Procedia Computer Science, vol. 117, pp. 266-273, 2017. https://doi.org/10.1016/j.procs.2017.10.118
  28. H. Alghamdi and A. Selamat, "Topic modelling used to improve Arabic web pages clustering," in 2015 International Conference on Cloud Computing (ICCC), 2015: IEEE, pp. 1-6.
  29. A. Abuzayed and H. Al-Khalifa, "BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique," Procedia Computer Science, vol. 189, pp. 191-194, 2021. https://doi.org/10.1016/j.procs.2021.05.096
  30. A. R. Alharbi, M. Hijji, and A. Aljaedi, "Enhancing topic clustering for Arabic security news based on k-means and topic modelling," IET Networks, vol. 10, no. 6, pp. 278-294, 2021. https://doi.org/10.1049/ntw2.12017
  31. A. Alsaad and M. Abbod, "Enhanced topic identification algorithm for Arabic Corpora," in 2015 17th UKSim-AMSS International Conference on Modelling and Simulation (UKSim), 2015: IEEE, pp. 90-94.
  32. H. M. Alghamdi and A. Selamat, "Topic detections in Arabic dark websites using improved vector space model," in 2012 4th Conference on Data Mining and Optimization (DMO), 2012: IEEE, pp. 6-12.
  33. K. Abainia, S. Ouamour, and H. Sayoud, "Topic Identification of Noisy Arabic Texts Using Graph Approaches," in 2015 26th International Workshop on Database and Expert Systems Applications (DEXA), 2015: IEEE, pp. 254-258.
  34. K. Abainia, S. Ouamour, and H. Sayoud, "Neural Text Categorizer for topic identification of noisy Arabic Texts," in 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015: IEEE, pp. 1-8.
  35. A. Alsanad, "Arabic Topic Detection Using Discriminative Multi nominal Naive Bayes and Frequency Transforms," in Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 2018, pp. 17-21.
  36. A. Kelaiaia and H. F. Merouani, "Clustering with probabilistic topic models on arabic texts: a comparative study of LDA and K-means," Int. Arab J. Inf. Technol., vol. 13, no. 2, pp. 332-338, 2016.
  37. M. Zrigui, R. Ayadi, M. Mars, and M. Maraoui, "Arabic text classification framework based on latent dirichlet allocation," Journal of computing and information technology, vol. 20, no. 2, pp. 125-140, 2012. https://doi.org/10.2498/cit.1001770
  38. B. Al-Salemi, M. J. Ab Aziz, and S. A. Noah, "LDA-AdaBoost. MH: Accelerated AdaBoost. MH based on latent Dirichlet allocation for text categorization," Journal of Information Science, vol. 41, no. 1, pp. 27-40, 2015. https://doi.org/10.1177/0165551514551496
  39. E. Alomari, I. Katib, A. Albeshri, and R. Mehmood, "COVID-19: Detecting government pandemic measures and public concerns from Twitter arabic data using distributed machine learning," International Journal of Environmental Research and Public Health, vol. 18, no. 1, p. 282, 2021.
  40. M. Hernandez-Mendoza, A. Aguilera, I. Dongo, J. Cornejo-Lupa, and Y. Cardinale, "Credibility Analysis on Twitter Considering Topic Detection," Applied Sciences, vol. 12, no. 18, p. 9081, 2022.
  41. M. A. AlGhamdi and M. A. Khan, "Intelligent analysis of Arabic tweets for detection of suspicious messages," Arabian Journal for Science and Engineering, vol. 45, pp. 6021-6032, 2020. https://doi.org/10.1007/s13369-020-04447-0
  42. M. Hasan, A. Rahman, M. R. Karim, M. S. I. Khan, and M. J. Islam, "Normalized approach to find optimal number of topics in Latent Dirichlet Allocation (LDA)," in Proceedings of International Conference on Trends in Computational and Cognitive Engineering: Proceedings of TCCE 2020, 2021: Springer, pp. 341-354.
  43. M. Alhawarat and M. Hegazi, "Revisiting k-means and topic modeling, a comparison study to cluster arabic documents," IEEE Access, vol. 6, pp. 42740-42749, 2018. https://doi.org/10.1109/ACCESS.2018.2852648
  44. X. Yan, J. Guo, Y. Lan, and X. Cheng, "A biterm topic model for short texts," in Proceedings of the 22nd international conference on World Wide Web, 2013, pp. 1445-1456.