DOI QR코드

DOI QR Code

Foreign Accents Classification of English and Urdu Languages, Design of Related Voice Data Base and A Proposed MLP based Speaker Verification System

  • 투고 : 2024.10.05
  • 발행 : 2024.10.30

초록

A medium scale Urdu speakers' and English speakers' database with multiple accents and dialects has been developed to use in Urdu Speaker Verification Systems, English Speaker Verification Systems, accents and dialect verification systems. Urdu is the national language of Pakistan and English is the official language. Majority of the people are non-native Urdu speakers and non-native English in all regions of Pakistan in general and Gilgit-Baltistan region in particular. In order to design Urdu and English speaker verification systems for security applications in general and telephone banking in particular, two databases has been designed one for foreign accent of Urdu and another for foreign accent of English language. For the design of databases, voice data is collected from 180 speakers from GB region of Pakistan who could speak Urdu as well as English. The speakers include both genders (males and females) with different age groups ranging from 18 to 69 years. Finally, using a subset of the data, Multilayer Perceptron based speaker verification system has been designed. The designed system achieved overall accuracy rate of 83.4091% for English dataset and 80.0454% for Urdu dataset. It shows slight differences (4.0% with English and 7.4% with Urdu) in recognition accuracy if compared with the recently proposed multilayer perceptron (MLP) based SIS achieved 87.5% recognition accuracy

키워드

참고문헌

  1. NSTC, "Biometrics ' Foundation Documents ,'" Subcomm. Biometrics, Natl. Sci. Technol. Counc., pp. 1-166, 2006. 
  2. A. K. Jain, A. Ross, and S. Prabhakar, "An Introduction to Biometric Recognition," vol. 14, no. 1, pp. 4-20, 2004. 
  3. S. Memon, S. Ghulam, S. Shah, K. Khoumbati, and I. A. Ismaili, "Securing Sensitive eDatabases using Multi- Biometric Technology," no. December 2018, 2010. 
  4. T. J. Ibm, "Biometric Recognition: Security and Privacy Concerns," pp. 33-42, 2003. 
  5. F. Selection, "Speaker Verification:," no. January, pp. 42-48, 1990. 
  6. D. A. Reynolds, "An Overview of Automatic Speaker Recognition Technology.1," pp. 4072-4075, 2002. 
  7. E. Karpov, "Real-Time Speaker Identification Real-Time Speaker Identification Evgeny Karpov University of Joensuu Department of Computer Science Master ' s Thesis," no. January 2004, 2014. 
  8. K. Khoumbati, "A Pashtu speakers database using accent and dialect approach Shahid Munir Shah*, Shahzad Ahmed Memon and Muhammad Moinuddin," vol. 4, no. 4, pp. 358-380, 2017. 
  9. Z. Liu, Z. Wu, T. Li, J. Li, and C. Shen, "GMM and CNN Hybrid Method for Short," IEEE Trans. Ind. Informatics, vol. 14, no. 7, pp. 3244-3252, 2018. 
  10. N. Chauhan and M. Chandra, "Speaker Recognition and Verification Using Artificial Neural Network," pp. 1147-1149, 2017. 
  11. F. Thullier, B. Bouchard, and B. J. Menelas, "A Text-Independent Speaker Authentication System for Mobile Devices," pp. 1-22, 2017. 
  12. N. Harte and E. Gillen, "TCD-TIMIT : An Audio-Visual Corpus of Continuous Speech," vol. 17, no. 5, pp. 603-615, 2015. 
  13. K. A. Lee, A. Larcher, G. Wang, P. Kenny, N. Br, D. Van Leeuwen, H. Aronowitz, M. Kockmann, C. Vaquero, B. Ma, H. Li, T. Stafylakis, J. Alam, A. Swart, J. Perez, A. Star, M. Lium, S. West, and S. Africa, "The RedDots Data Collection for Speaker Recognition," pp. 2996-3000, 2016. 
  14. A. W. Abbas, N. Ahmad, and H. Ali, "Pashto Spoken Digits Database for the Automatic Speech Recognition Research," no. September, pp. 8-11, 2012. 
  15. A. Alarifi and I. Alkurtass, "Arabic Text-Dependent Speaker Verification for Mobile Devices Using Artificial Neural Networks," pp. 350-353, 2011. 
  16. G. Droua-hamdani, S. A. Selouani, M. Boudraa, and T. H. Boumediene, "Algerian Arabic Speech Database ( ALGASD ): Corpus design and automatic speech recognition application CORPUS DESIGN AND AUTOMATIC SPEECH," no. December, 2010. 
  17. H. Sarfraz, S. Hussain, R. Bokhari, A. A. Raza, I. Ullah, S. Pervez, A. Mustafa, I. Javed, and R. Parveen, "Speech Corpus Development for a Speaker Independent Spontaneous Urdu Speech Recognition System," pp. 1-6, 2010. 
  18. M. Alghamdi, F. Alhargan, M. Alkanhal, A. Alkhairy, M. Eldesouki, and A. Alenazi, "Saudi Accented Arabic Voice Bank," J. King Saud Univ. - Comput. Inf. Sci., vol. 20, pp. 45-64, 2008. 
  19. R. Kumar, S. P. Kishore, A. Gopalakrishna, R. Chitturi, S. Joshi, S. Singh, R. N. V Sitaram, G. Anumanchipalli, R. Chitturi, S. Joshi, R. Kumar, S. P. Singh, R. N. V Sitaram, and S. P. Kishore, "Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems by Vocabulary Speech Recognition Systems," no. July, 2007. 
  20. J. Hennebert, H. Melin, D. Petrovska, and D. Genoud, "POLYCOST : A telephone-speech database for speaker," vol. 31, 2000. 
  21. J. Ortega-garcia, J. Gonzalez-rodriguez, and V. Marrero-aguiar, "AHUMADA : A large speech corpus in Spanish for speaker characterization and identi ® cation q," vol. 31, pp. 255-264, 2000. 
  22. R. Cole, M. Noel, and V. Noel, "THE CSLU SPEAKER RECOGNITION CORPUS," no. Icslp 98, pp. 98-101, 1998. 
  23. "GANDALF - A SWEDISH TELEPHONE SPEAKER VERIFICATION DATABASE," pp. 3-6. 
  24. M. Falcone, A. Gdlo, and V. B. Castiglione, "THE ' SIVA ' SPEECH DATABASE FOR SPEAKER VERIFICATION : DESCRIPTION AND EVALUATION," pp. 1902-1905, 1902. 
  25. "TESTING WITH THE YOHO CD-ROM VOICE VERIFICATION CORPUS Joseph," pp. 341-344, 1995. 
  26. J. Sinclair and C. Watson, "The Development of the Otago Speech Database," pp. 298-301, 1995. 
  27. "ISCA Archive 4," no. September, pp. 817-820, 1995. 
  28. A. S. Recognition, "ISCA Archive," no. April, pp. 39-42, 1994. 
  29. J. J. Godfrey and E. C. Holliman, "SWITCHBOARD : Telephone Speech Corpus for Research and Development," pp. 517-520, 1992.
  30. https://en.wikipedia.org/wiki/Districts_of_Gilgit%E2%80%93Baltistan 
  31. https://en.wikipedia.org/wiki/Gilgit-Baltistan 
  32. Y. A. Ibrahim, J. C. Odiketa, and T. S. Ibiyemi, "Preprocessing technique in automatic speech recogntion for human computer interaction: an overview," Ann. Comput. Sci. Ser., vol. XV, no. 1, pp. 186 - 191, 2017. 
  33. https://www.jcbrolabs.org/speech-processing 
  34. Nisha, "Voice Recognition Technique : A Review," Int. J. Res. Appl. Sci. Eng. Technol., vol. 5, no. V, pp. 262-268, 2017. 
  35. B. Tomassetti, M. Verdecchia, and F. Giorgi, "NN5: A neural network based approach for the downscaling of precipitation fields - Model description and preliminary results," J. Hydrol., vol. 367, no. 1-2, pp. 14-26, 2009. 
  36. B. Choubin, S. Khalighi-Sigaroodi, A. Malekian, and O. Kisi, "Multiple linear regression, multilayer perceptron network and adaptive neuro-fuzzy inference system for forecasting precipitation based on large-scale climate signals," Hydrol. Sci. J., vol. 61, no. 6, pp. 1001-1009, 2016.