DOI QR코드

DOI QR Code

A Survey of Machine Translation and Parts of Speech Tagging for Indian Languages

  • Khedkar, Vijayshri (Symbiosis Institute of Technology, Symbiosis International (Deemed University)) ;
  • Shah, Pritesh (Symbiosis Institute of Technology, Symbiosis International (Deemed University))
  • Received : 2022.04.05
  • Published : 2022.04.30

Abstract

Commenced in 1954 by IBM, machine translation has expanded immensely, particularly in this period. Machine translation can be broken into seven main steps namely- token generation, analyzing morphology, lexeme, tagging Part of Speech, chunking, parsing, and disambiguation in words. Morphological analysis plays a major role when translating Indian languages to develop accurate parts of speech taggers and word sense. The paper presents various machine translation methods used by different researchers for Indian languages along with their performance and drawbacks. Further, the paper concentrates on parts of speech (POS) tagging in Marathi dialect using various methods such as rule-based tagging, unigram, bigram, and more. After careful study, it is concluded that for machine translation, parts of speech tagging is a major step. Also, for the Marathi language, the Hidden Markov Model gives the best results for parts of speech tagging with an accuracy of 93% which can be further improved according to the dataset.

Keywords

References

  1. Wlodek Zadrozny, Valeria de Paiva, Lawrence S. Moss. Explaining Watson: Polymath Style. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.
  2. Hemant Darbari, Iti Mathur, Nisheeth Joshi. 2013. Human and Automatic Evaluation of English-Hindi Machine Translation
  3. Shachi Mall, Umesh Chandra Jaiswal. Survey: Machine Translation for Indian Language. International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 1 (2018) pp. 202-209.
  4. Mr. Nadeem Jadoon Khan, Mr. Waqas Anwar, Mr. Nadir Durrani. Machine Translation Approaches and Survey for Indian Languages.
  5. Luchezar Jackov. Institute for Bulgarian Language Bulgarian Academy of Sciences. Feature-Rich Part-Of-Speech Tagging Using Deep Syntactic and Semantic Analysis.
  6. Vikas Yadav, Steven Bethard. A Survey on Recent Advances in Named Entity Recognition from Deep Learning models.
  7. Nisheeth Joshi, Hemant Darbari, Iti Mathur. Center for Development of Advanced Computing, Pune, Maharashtra, India, Department of Computer Science, Banasthali University, India. Hmm based tagging for the Hindi language.
  8. Suvarna G Kanakaraddi, Suvarna S Nandyal. Survey on Parts of Speech Tagger Techniques. Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India.
  9. Pammi, S. C., & Prahallad, K. (2007, January). POS tagging and chunking using decision forests. In IJCAI Workshop on Shallow Parsing for South Asian Languages (pp. 33-36).
  10. Hettige, B., & Karunananda, A. S. (2006, August). A Parser for Sinhala Language-First Step Towards English to Sinhala Machine Translation. In First International Conference on Industrial and Information Systems
  11. Sugandhi, R. S., Shekhar, R., Agarwal, T., Bedi, R. K., & Wadhai, V. M. (2011, December). Issues in Parsing for Machine Aided Translation from English to Hindi. In Information and Communication Technologies (WICT), 2011 World Congress on (pp. 754-759). IEEE.
  12. Ambati, B. R., Husain, S., Jain, S., Sharma, D. M., & Sangal, R. (2010, June). Two methods to incorporate local morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of MorphologicallyRich Languages (pp. 22-30). Association for Computational Linguistics.
  13. Venkatapathy, S., & Bangalore, S. (2009). Discriminative machine translation using global lexical selection. ACM Transactions on Asian Language Information Processing (TALIP), 8(2), 8.
  14. Saktel, P., & Shrawankar, U. (2012, March). Context-based Meaning Extraction for HCI using WSD algorithm: A review. In Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on (pp. 208-212). IEEE.
  15. Agarwal, M., & Bajpai, J. (2014, August). Correlation-based Word Sense Disambiguation. In Contemporary Computing (IC3), 2014 Seventh International Conference on (pp. 382-386). IEEE.
  16. Sastry, G. R., Chaudhuri, S., & Reddy, P. N. (2007). An HMM-based Part-Of-Speech tagger and statistical chunker for 3 Indian languages. Shallow Parsing for South Asian Languages, 13.
  17. Sunitha, C. (2015, August). A hybrid Parts Of Speech tagger for the Malayalam language. In Advances in Computing, Communications, and Informatics (ICACCI), 2015 International Conference on (pp. 1502-1507). IEEE.
  18. Swati Tyagi, Gouri Shankar Mishra. Statistical analysis of part of speech tagging algorithms for English corpus. International Journal of Advanced Research, Ideas and Innovations in Technology
  19. Omid Kashef. Unsupervised Part-of-Speech Induction. Intelligent Systems Program University of Pittsburgh.
  20. https://en.wikipedia.org/wiki/Baum-Welch_algorithm
  21. Beata Megyesi. Brill's rule-based PoS tagger. Department of Linguistics University of Stockholm Extract from D-level thesis.
  22. https://towardsdatascience.com/probability-concepts-explained-maximum-likelihood-estimation-c7b4342fdbb1
  23. Daniel Jurafsky & James H. Martin. Speech and Language Processing.N-gram Language Models. Draft of October 2, 2019.
  24. Berenike Litz, Hagen Langer, and Rainer Malaka. TRIGRAMS'n'TAGS FOR LEXICAL KNOWLEDGE ACQUISITION. University of Bremen, Germany
  25. Christina-Elisavet Pertsinidou and Nikolaos Limnios. Viterbi algorithms for Hidden semi-Markov Models with application to DNA Analysis.
  26. https://wordnet.princeton.edu
  27. Timothy R. Giannetti. St. John Fisher College. Google Translate as a Resource for Writing
  28. Samantha Young. Babel Fish. New Jersey Governor's School of Engineering & Technology 2015
  29. Amruta Godase1 and Sharvari Govilkar. Machine Translation Development for Indian Languages and its Approaches. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2, April 2015.
  30. Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar, Hemant Darbari. Evaluation and Ranking of Machine Translated Output in Hindi Language using Precision and Recall Oriented Metrics. International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970) Volume-4 Number-1 Issue-14 March-2014.
  31. Ankita Agarwal, Pramila, Shashi Pal Singh, Ajai Kumar, Hemant Darbari. Morphological Analyser for Hindi A Rule- Based Implementation. International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970) Volume-4 Number-1 Issue-14 March-2014.
  32. Shashi Pal Singh, Ajai Kwnar, Hemant Darbari, Lenali Singh, Anshika Rastogi, Shikha Jain. Machine Translation using Deep Learning: An Overview. 2017 International Conference on Computer, Communications, and Electronics (Comptelix) Manipal University Jaipur, Malaviya National Institute of/Technology Jaipur & IRISWORLD, July 01-02,2017.
  33. Shashi Pal Singh, Ajai Kumar, Hemant Darbari, Lenali Singh, Nisheeth Joshi, Priya Gupta, and Sneha Singh. Intelligent System for Automatic Transfer Grammar Creation using Parallel Corpus.
  34. G V Garje, G K Kharate. Survey of Machine Translation Systems in India. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.4, August 2013.
  35. John Hutchins, (2005) "Current commercial machine translation systems and computer-based translation tools: system types and their uses", International Journal of Translation vol.17, no.1-2, pp.5-38.
  36. Sampark: Machine Translation System among Indian languages (2009) http://tdildc.in/index.php?option=com_vertical& parentid =74, http://sampark.iiit.ac.in/