DOI QR코드

DOI QR Code

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

  • Miran Kim (Department of English Education, Gyeongsang National University)
  • Received : 2023.05.22
  • Accepted : 2023.06.14
  • Published : 2023.06.30

Abstract

This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.

Keywords

Acknowledgement

The author would like to express appreciation to teachers H.J. Park, H.S. Sohn, and W.C. Jung for their valuable discussions and participation in the pronunciation assessment experiment.

References

  1. Abercrombie, D. (1949). Teaching pronunciation. English Language Teaching Journal, 3(5), 113-122. https://doi.org/10.1093/elt/III.5.113
  2. Archibald, J. (1998). Second language phonology. Amsterdam: John Benjamins.
  3. Baralt, M., Pennestri, S., & Selvandin, M. (2011). Using wordles to teach foreign language writing. Language Learning & Technology, 15(2), 12-22.
  4. Boersma, P., & Weeknink, D. (2023). Praat: Doing phonetics by computer (version 6.3.1) [Computer program]. Retrieved from http://www.praat.org/
  5. Brown, A. (1989). Some thoughts on intelligibility. The English Teacher, 18, 1-16.
  6. Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater's familiarity with a candidate's pronunciation affect the rating in oral proficiency interviews? Language Testing, 28(2), 201-219. https://doi.org/10.1177/0265532210393704
  7. Dale, P., & Poms, L. (2005). English pronunciation made simple. White Plains, NY: Pearson Education.
  8. Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2 teaching and research. Amsterdam, Netherlands: John Benjamins.
  9. Derwing, T. M., Munro, M. J., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592-603. https://doi.org/10.2307/3587748
  10. Derwing, T. M., Rossiter, M. J., & Munro, M. J. (2002). Teaching native speakers to listen to foreign-accented speech. Journal of Multilingual and Multicultural Development, 23(4), 245-259. https://doi.org/10.1080/01434630208666468
  11. Fairbanks, G. (1960). Voice and articulation drillbook (2nd ed.). New York, NY: Harper & Row.
  12. Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995). Effects of age of second-language learning on the production of English consonants. Speech Communication, 16(1), 1-26. https://doi.org/10.1016/0167-6393(94)00044-B
  13. Hincks, R. (2003). Speech technologies for pronunciation feedback and evaluation. ReCALL, 15(1), 3-20. https://doi.org/10.1017/S0958344003000211
  14. Hong, Y., & Nam, H. (2021). Evaluating score reliability of automatic English pronunciation assessment system for education. Studies in Foreign Language Education, 35(1), 91-104. https://doi.org/10.16933/SFLE.2021.35.1.91
  15. Jenkins, J. (2000). The phonology of English as an international language. Oxford, UK: Oxford University Press.
  16. Kissling, E. M. (2013). Teaching pronunciation: Is explicit phonetics instruction beneficial for FL learners? The Modern Language Journal, 97(3), 720-744. https://doi.org/10.1111/j.1540-4781.2013.12029.x
  17. Lado, R. (1957). Linguistics across cultures: Applied Linguistics for Language Teachers. Ann Arbor, MI: University of Michigan Press.
  18. Lado, R. (1961). Language testing: The construction and use of foreign language tests. London: Longman.
  19. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady, 10(8), 707-710.
  20. Levis, J. (2010, September). Assessing speech intelligibility: Experts listen to two students. Proceedings of the 2nd Pronunciation in Second Language Learning and Teaching Conference (pp. 56-69). Ames, IA: Iowa State University.
  21. Munro, M. J. (2010, September). Intelligibility: Buzzword or buzzworthy? Proceedings of the 2nd Pronunciation in Second Language Learning and Teaching Conference (pp, 7-16). Ames, IA: Iowa State University.
  22. OpenAI. (2023). Whisper [Computer software]. Retrieved from https://github.com/openai/whisper.git
  23. Park, A. Y. (2017). The study on automatic speech recognizer utilizing mobile platform on Korean EFL learners' pronunciation development. Journal of Digital Contents Society, 18(6), 1101-1107. https://doi.org/10.9728/DCS.2017.18.6.1101
  24. Park, H., Kim, D. H., & Joung, J. (2016). An automatic pronunciation evaluation system using non-native teacher's speech model. The Journal of the Institute of Internet, Broadcasting and Communication, 16(2), 131-136. https://doi.org/10.7236/JIIBC.2016.16.2.131
  25. Purcell, E. T., & Suter, R. W. (1980). Predictors of pronunciation accuracy: A reexamination. Language Learning, 30(2), 271-287. https://doi.org/10.1111/j.1467-1770.1980.tb00319.x
  26. Schulz, K. U., & Mihov, S. (2002). Fast string correction with Levenshtein automata. International Journal of Document Analysis and Recognition, 5(1), 67-85. https://doi.org/10.1007/s10032-002-0082-8
  27. Spring, R., & Tabuchi, R. (2022). The role of ASR training in EFL pronunciation improvement: An in-depth look at the impact of treatment length and guided practice on specific pronunciation points. Computer Assisted Language Learning Electronic Journal, 23(3), 163-185.
  28. Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. Scotts Valley, CA: CreateSpace.
  29. Winke, P., Gass, S., & Myford, C. (2013). Raters' L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231-252. https://doi.org/10.1177/0265532212456968
  30. Yang, B. (2020). An evaluation of Korean students' pronunciation of an English passage by a speech recognition application and two human raters. Phonetics and Speech Sciences, 12(4), 19-25. https://doi.org/10.13064/KSSS.2020.12.4.019