DOI QR코드

DOI QR Code

HMM Based Part of Speech Tagging for Hadith Isnad

  • Abdelkarim Abdelkader (Computer Engineering Department, Faculty of Computing at Algunfdha, Umm Al-Qura University)
  • Received : 2023.03.05
  • Published : 2023.03.30

Abstract

The Hadith is the second source of Islamic jurisprudence after Qur'an. Both sources are indispensable for muslims to practice Islam. All Ahadith are collected and are written. But most books of Hadith contain Ahadith that can be weak or rejected. So, quite a long time, scholars of Hadith have defined laws, rules and principles of Hadith to know the correct Hadith (Sahih) from the fair (Hassen) and weak (Dhaif). Unfortunately, the application of these rules, laws and principles is done manually by the specialists or students until now. The work presented in this paper is part of the automatic treatment of Hadith, and more specifically, it aims to automatically process the chain of narrators (Hadith Isnad) to find its different components and affect for each component its own tag using a statistical method: the Hidden Markov Models (HMM). This method is a power abstraction for times series data and a robust tool for representing probability distributions over sequences of observations. In this paper, we describe an important tool in the Hadith isnad processing: A chunker with HMM. The role of this tool is to decompose the chain of narrators (Isnad) and determine the tag of each part of Isnad (POI). First, we have compiled a tagset containing 13 tags. Then, we have used these tags to manually conceive a corpus of 100 chains of narrators from "Sahih Alboukhari" and we have extracted a lexicon from this corpus. This lexicon is a set of XML documents based on HPSG features and it contains the information of 134 narrators. After that, we have designed and implemented an analyzer based on HMM that permit to assign for each part of Isnad its proper tag and for each narrator its features. The system was tested on 2661 not duplicated Isnad from "Sahih Alboukhari". The obtained result achieved F-scores of 93%.

Keywords

References

  1. M. Najeeb, A. Abdelkader, and M. Al-Zghoul, ''Arabic natural language processing laboratory serving Islamic sciences,'' Int. J. Adv. Comput. Sci. Applic., vol. 5, no. 3, pp. 114-117, 2014.  https://doi.org/10.14569/IJACSA.2014.050316
  2. M. Najeeb, A. Abdelkader, M. Al-Zghoul, A. Osman «A Lexicon for Hadith Science Based on a Corpus» (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2) ,, 1336-1340.
  3. A. Osman, M. Najeeb, A. Abdelkader, and M. Al-Zghoul, "Hadith Graduation as a Service on cloud computing," International Conference on Cloud Computing ICCC 2015. The College Of Computer and Information Sciences at Princess Nourah bint Abdulrahman University, Riyadh, Kingdom of Saudi Arabia on April 27-28, 2015. 
  4. M. Najeeb, "Towards Innovative System for Hadith Isnad Processing," International Journal of Computer Trends and Technology (IJCTT) V18(6Dec 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group, pp. 257-259, 2014.  https://doi.org/10.14445/22312803/IJCTT-V18P154
  5. M. Najeeb,, "XML database for hadith and narrators," American Journal of Applied Sciences 13, 1, pp. 55-63. 2016.  https://doi.org/10.3844/ajassp.2016.55.63
  6. M. Najeeb, "Multi-agent system for hadith processing," International Journal of Software Engineering and Its Applications 9, 9, pp. 153-166, 2015.  https://doi.org/10.14257/ijseia.2015.9.9.13
  7. A. Abdelkader, M. Najeeb, M. Alnamari and H. Malik. "Creation of Arabic Ontology for Hadith Science," International Journal of Advanced Trends in Computer Science and Engineering. IJATCSE Volume 8, No.6, pp. 3269-3276, 2019.  https://doi.org/10.30534/ijatcse/2019/96862019
  8. A. Abdelkader, M. Najeeb, M. Alnamari and H. Malik. "How Can Existing NLP Tools of Arabic Language Serve Hadith Processing," International Journal of Computer Engineering and Technology (IJCET) Volume 10, Issue 06, pp. 22-31, 2019.  https://doi.org/10.34218/IJCET.10.6.2019.003
  9. M. M. Al-Azami, Studies in Hadith Methodology and Literature. Indianapolis, IN, USA: American Trust, 1978. 
  10. Islam Web. Accessed: Feb. 05, 2022. [Online]. Available: http://www.islamweb.net 
  11. Dorar. Accessed: Jan. 28, 2022. [Online]. Available: http://www.dorar.net 
  12. إسماعيل رضوان، طالب أبوشعر، "منهج الحكم على الأسانيد"، مكتبة 2006 وطبعة دار المنارة،
  13. K. Faidi, R. Ayed, I. Bounhas, and B. Elayeb, ''Comparing Arabic NLP tools for Hadith classification,'' Int. J. Islamic Appl. Comput. Sci. Technol., vol. 3, no. 3, pp. 1-12, 2015 
  14. I. Bounhas, "On the Usage of a Classical Arabic Corpus as a Language Resource: Related Research and Key Challenges," Published in ACM Trans. Asian & Low, DOI:10.1145/3277591, 2019. 
  15. A. Abdelkarim, D. Boumiza and R. Braham, "A categorization algorithm for the Arabic language," International Conference on Communication, Computer and Power (ICCCP'09), Muscat, February 2009. 
  16. A. Azmi, A. Al-Qabbany and A. Hussain, "Computational and natural language processing based studies of hadith literature: A survey," Artif Intell Rev manuscript, 2019. 
  17. E. Brill, "Some Advances in Transformation Based Part of Speech Tagging," In proc. Of ICAI'94 (The Twelfth International Conference on Artificial Intelligence) 722-727, 1994. 
  18. S. Kopru, "An efficient part-of-speech Tagger for Arabic,'' Proceedings of the 12th international conference on Computational linguistics and intelligent text processing (CICLing'11), Tokyo, Japan, 2011. 
  19. K. Duh and K. Kirchhoff, "POS Tagging of Dialectal Arabic: A Minimally Supervised Approach,'' In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, Michigan. Association for Computational Linguistics,pp. 55-62, 2005. 
  20. T. Brants, "statistical part of speech tagger," In proc. of ANLP'2000 (the 6th Conference on Applied Natural Language Processing) : 224-231, 2000. 
  21. M. Diab., H. Kadri. and J. Daniel, "Automatic Tagging of Arabic Text : From Raw Text to Base Phrase Chunks,'' In proc. of HLTNAACL'04 (Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics), pp. 149-152, 2004. 
  22. S. Khoja, "APT : Arabic Part-of-speech Tagger," In proc. of NAACL'2001 (the Student Workshop at the Second Meeting of the North American Chapter of the Association for Computational Linguistics) : 20-26, 2001. 
  23. J. Paddy, "Scientific efforts to serve the Sunnah on the internet websites display, analyze and evaluate," in Siminar of efforts in the Sunnah Service, Sharjah University. 
  24. M. Najeeb, "A Novel Hadith Processing Approach Based on Genetic Algorithms,'' IEEE Access, Vol 8, 2020. 
  25. M. Albared, N. Omar, M. AbAziz, "Developing a Competitive HMM Arabic POS Tagger Using Small Training Corpora,'' In: N.T. Nguyen, C.-G. Kim, and A. Janiak (Eds.): ACIIDS 2011, LNAI 6591, pp. 288-296, 2011. 
  26. G. D. Forney, "The Viterbi Algorithm,'' In proc. of the IEEE Transactions on Information Theory, pp. 263-278, 1973.