DOI QR코드

DOI QR Code

Semantic Feature Analysis for Multi-Label Text Classification on Topics of the Al-Quran Verses

  • Received : 2021.07.05
  • Accepted : 2022.09.29
  • Published : 2024.02.29

Abstract

Nowadays, Islamic content is widely used in research, including Hadith and the Al-Quran. Both are mostly used in the field of natural language processing, especially in text classification research. One of the difficulties in learning the Al-Quran is ambiguity, while the Al-Quran is used as the main source of Islamic law and the life guidance of a Muslim in the world. This research was proposed to relieve people in learning the Al-Quran. We proposed a word embedding feature-based on Tensor Space Model as feature extraction, which is used to reduce the ambiguity. Based on the experiment results and the analysis, we prove that the proposed method yields the best performance with the Hamming loss 0.10317.

Keywords

References

  1. M. P. Akhter, Z. Jiangbin, I. R. Naqvi, M. Abdelmajeed, A. Mehmood, and M. T. Sadiq, "Document-level text classification using single-layer multisize filters convolutional neural network," IEEE Access, vol. 8, pp. 42689-42707, 2020. https://doi.org/10.1109/ACCESS.2020.2976744 
  2. R. A. Pane, M. S. Mubarok, and N. S. Huda, "A multi-lable classification on topics of Quranic verses in English translation using multinomial naive Bayes," in Proceedings of 2018 6th International Conference on Information and Communication Technology (ICoICT), Bandung, Indonesia, 2018, pp. 481-484. https://doi.org/10.1109/ICoICT.2018.8528777 
  3. H. J. Kim, J. Kim, J. Kim, and P. Lim, "Towards perfect text classification with Wikipedia-based semantic Naive Bayes learning," Neurocomputing, vol. 315, pp. 128-134, 2018. https://doi.org/10.1016/j.neucom.2018.07.002 
  4. M. N. Al-Kabi, H. A. Wahsheh, I. M. Alsmadi, and A. M. A. Al-Akhras, "Extended topical classification of hadith Arabic text," International Journal of Islamic Applications in Computer Science and Technology, vol. 3, no. 3, pp. 13-24, 2015. 
  5. G. Mediamer, S. Al Faraby, and Adiwijaya, "Development of rule-based feature extraction in multi-label text classification," International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 4, pp. 1460-1465, 2019. https://doi.org/10.18517/ijaseit.9.4.8894 
  6. M. Y. A. Bakar and S. Al Faraby, "Multi-label topic classification of hadith of Bukhari (Indonesian language translation) using information gain and backpropagation neural network," in Proceedings of 2018 International Conference on Asian Language Processing (IALP), Bandung, Indonesia, 2018, pp. 344-350. https://doi.org/10.1109/IALP.2018.8629263 
  7. A. Ta'a, Q. A. Abed, and M. Ahmad, "Al-Quran ontology based on knowledge themes," Journal of Fundamental and Applied Sciences, vol. 9, no. 5S, pp. 800-817, 2017. https://doi.org/10.4314/jfas.v9i5s.57 
  8. Pew Research Center, "The future of world religions: population growth projections, 2010-2050," 2015 [Online]. Available: http://www.pewforum.org/2015/04/02/religious-projections-2010-2050/. 
  9. A. M. K. Izzaty, M. S. Mubarok, N. S. Huda, and Adiwijaya, "A multi-label classification on topics of Quranic verses in English translation using tree augmented Naive Bayes," in Proceedings of 2018 6th International Conference on Information and Communication Technology (ICoICT), Bandung, Indonesia, 2018, pp. 103-106. https://doi.org/10.1109/ICoICT.2018.8528802 
  10. G. I. Ulumudin, A. Adiwijaya, and M. S. Mubarok, "A multilabel classification on topics of qur'anic verses in English translation using k-nearest neighbor method with weighted TF-IDF," Journal of Physics: Conference Series, vol. 1192, no. 1, article no. 012026, 2019. https://doi.org/10.1088/1742-6596/1192/1/012026
  11. N. S. Huda, M. S. Mubarok, and Adiwijaya, "A multi-label classification on topics of Quranic verses (English translation) using backpropagation neural network with stochastic gradient descent and Adam optimizer," in Proceedings of 2019 7th International Conference on Information and Communication Technology (ICoICT), Kuala Lumpur, Malaysia, 2019, pp. 1-5. https://doi.org/10.1109/ICoICT.2019.8835362 
  12. F. S. Nurfikri, "A comparison of Neural Network and SVM on the multi-label classification of Quran verses topic in English translation," Journal of Physics: Conference Series, vol. 1192, no. 1, article no. 012030, 2019. https://doi.org/10.1088/1742-6596/1192/1/012030 
  13. M. Biniz, R. El Ayachi, and M. Fakir, "Ontology matching using BabelNet dictionary and word sense disambiguation algorithms," Indonesian Journal of Electrical Engineering and Computer Science, vol. 5, no. 1, pp. 196-205, 2017. http://doi.org/10.11591/ijeecs.v5.i1.pp196-205 
  14. H. J. Kim, J. Kim, and J. Kim, "Semantic text classification with tensor space model-based naive Bayes," in Proceedings of 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 2016, pp. 4206-4210. https://doi.org/10.1109/SMC.2016.7844892 
  15. P. C. Sen, M. Hajra, and M. Ghosh, "Supervised classification algorithms in machine learning: a survey and review," in Emerging Technology in Modelling and Graphics. Singapore: Springer, 2020, pp. 99-111. https://doi.org/10.1007/978-981-13-7403-6_11 
  16. K. J. Hong and H. J. Kim, "A semantic search technique with Wikipedia-based text representation model," in Proceedings of 2016 International Conference on Big Data and Smart Computing (BigComp), Hong Kong, China, 2016, pp. 177-182. https://doi.org/10.1109/BIGCOMP.2016.7425818 
  17. G. Drakopoulos, A. Kanavos, I. Karydis, S. Sioutas, and A. G. Vrahatis, "Tensor-based semantically-aware topic clustering of biomedical documents," Computation, vol. 5, no. 3, article no. 34, 2017. https://doi.org/10.3390/computation5030034 
  18. L. Zhang, P. Zhang, X. Ma, S. Gu, Z. Su, and D. Song, "A generalized language model in tensor space," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 7450-7458, 2019. https://doi.org/10.1609/aaai.v33i01.33017450 
  19. J. Lilleberg, Y. Zhu, and Y. Zhang, "Support vector machines and word2vec for text classification with semantic features," in Proceedings of 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), Beijing, China, 2015, pp. 136-140. https://doi.org/10.1109/ICCICC.2015.7259377 
  20. D. Rahmawati and M. L. Khodra, "Word2vec semantic representation in multilabel classification for Indonesian news article," in Proceedings of 2016 International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA), Penang, Malaysia, 2016, pp. 1-6. https://doi.org/10.1109/ICAICTA.2016.7803115 
  21. D. Rahmawati and M. L. Khodra, "Automatic multilabel classification for Indonesian news articles," in Proceedings of 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), Chonburi, Thailand, 2015, pp. 1-6. https://doi.org/10.1109/ICAICTA.2015.7335382 
  22. K. S. Eo and K. C. Lee, "Investigating opinion mining performance by combining feature selection methods with word embedding and BOW (bag-of-words)," Journal of Digital Convergence, vol. 17, no. 2, pp. 163-170, 2019. https://doi.org/10.14400/JDC.2019.17.2.163 
  23. S. Quran, Cordova Al-Quran dan Terjemahan. Bandung, Indonesia: Syaamil Quran, 2004. 
  24. S. M. Kandi, "Language modelling for handling out-of-vocabulary words in natural language processing," Master's thesis, Department of Mathematics, London School of Economics and Political Science, London, UK, 2018. https://doi.org/10.13140/RG.2.2.32252.08329