DOI QR코드

DOI QR Code

Assessment of performance of machine learning based similarities calculated for different English translations of Holy Quran

  • Al Ghamdi, Norah Mohammad (Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU)) ;
  • Khan, Muhammad Badruddin (Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU))
  • Received : 2022.04.05
  • Published : 2022.04.30

Abstract

This research article presents the work that is related to the application of different machine learning based similarity techniques on religious text for identifying similarities and differences among its various translations. The dataset includes 10 different English translations of verses (Arabic: Ayah) of two Surahs (chapters) namely, Al-Humazah and An-Nasr. The quantitative similarity values for different translations for the same verse were calculated by using the cosine similarity and semantic similarity. The corpus went through two series of experiments: before pre-processing and after pre-processing. In order to determine the performance of machine learning based similarities, human annotated similarities between translations of two Surahs (chapters) namely Al-Humazah and An-Nasr were recorded to construct the ground truth. The average difference between the human annotated similarity and the cosine similarity for Surah (chapter) Al-Humazah was found to be 1.38 per verse (ayah) per pair of translation. After pre-processing, the average difference increased to 2.24. Moreover, the average difference between human annotated similarity and semantic similarity for Surah (chapter) Al-Humazah was found to be 0.09 per verse (Ayah) per pair of translation. After pre-processing, it increased to 0.78. For the Surah (chapter) An-Nasr, before preprocessing, the average difference between human annotated similarity and cosine similarity was found to be 1.93 per verse (Ayah), per pair of translation. And. After pre-processing, the average difference further increased to 2.47. The average difference between the human annotated similarity and the semantic similarity for Surah An-Nasr before preprocessing was found to be 0.93 and after pre-processing, it was reduced to 0.87 per verse (ayah) per pair of translation. The results showed that as expected, the semantic similarity was proven to be better measurement indicator for calculation of the word meaning.

Keywords

References

  1. M. F. Al Nabhan ,"The need to translate the Qur'an تأليف ",Introduction to the sciences of the Noble Qur'an ,Aleppo ,Dar of Quran world ,1426 , p. 286.
  2. H. Al Jazi ،"Interpretation controls،" mawdoo3 11 ،Oct 2021. متصل]Available: https://bit.ly/3IDP30p.
  3. Ahadi ،"alquran english،" 1 July متصل]. 2009 ] .Available: https://www.alquranenglish.com.[6 1 /. [تاريخ الوصول 2022
  4. K. M. karbia ،"Reading of Religious Text "،Journal of Humanities and Social Sciences ،pp. 94 - 110 .2019 11 30 ،
  5. A. Jauhari ،I. O. Suzanti ،Y. D. Pramudita ،Nourma Pangestika Wulan Diantisari و Husni ،"Enhanced Confix Stripping Stemmer And Cosine Similarity For Search Engine in The Holy Qur'anTranslation "، Information Technology International Seminar (ITIS) 16-14 ،October 2020 .
  6. R. Agliz ،"Translation of Religious Texts: Difficulties and Challenges "،Arab World English Journal (AWEJ) Special Issue on Translation ، pp. 182-193 4 ،May 2015 .
  7. D. K. M. S. Al-Faqih ،"A Mathematical Phenomenon in the Quran of Earth-Shattering Proportions: A Quranic Theory Based on Gematria Determining Quran Primary Statistics (words, verses, chapters) and Revealing its Fascinating Connection with the Golden Ratio "،Journal of Arts and Humanities ،pp. 52-73. ،June .2017
  8. A. F. Huda ،D. R. Moch ،S. U. Q. ،W .Darmalaksana ،U . Rahmani و M. ،"Analysis Partition Clustering and Similarity Measure on Al-Quran Verses 18 "،July .2020
  9. M. Z. Murah ،"Similarity Evaluation of English translations of the Holy Quran تأليف "،Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences ، Al-Madina Al-Monawara .2013 ،
  10. M. Shajalal و M. Aono ،"Semantic textual similarity between sentences using bilingual word semantics "،Progress in Artificial Intelligence 9 ،March .2019
  11. M. B. Khan ،"Application of Computational Stylistics and Text Mining Techniques to Identify and Compare Salient Features of Different English Translations of the Holy Quran "،International Journal of Computer Science and Network Security ،pp. 147 - 154 20 ،Fabruary .2020
  12. S. Saeed ،S. Haider و Q. Rajput ،"On Finding Similar Verses from the Holy Quran using Word Embeddings "،IEEE Xplore 8 ،April .2021
  13. A. Jauhari ،I. O. Suzanti ،Y. D. Pramudita ،Husni و N. P. W. Diantisari ،"Enhanced Confix Stripping Stemmer And Cosine Similarity For Search Engine in The Holy Qur'an Translation "،Information Technology International Seminar (ITIS) ،pp. 207-212 ،October 2020 .
  14. M. Alian و A. Awajan ،"Arabic Semantic Similarity Approaches - Review "،IEEE .2018 ،
  15. S. Prabhakaran ،"Cosine Similarity - Understanding the math and how it works (with python codes)،" machine learning 22 ،Oct 2018 . متصل]. ]Available: https://www.machinelearningplus.com/nlp/cosine-similarity.[ 2021 12 /. [تاريخ الوصول 7
  16. S. Chouksey ،"Demonstrating Calculation of TF-IDF From Sklearn،" Analytics Vidhya .[ 2020 . [متصل 04 21 ،Available: https://medium.com/analytics-vidhya/demonstrating-calculation-of-tfidf-from-sklearn-4f9526e7e78b .
  17. A. Ali ،F. Alfayez و H. Alquhayz ،"SEMANTIC SIMILARITY MEASURES BETWEEN WORDS: A BRIEF SURVEY "،International Center for Advanced Interdisciplinary Research (ICAIR) ،pp. 907-914 ، 18December 2018.