DOI QR코드

DOI QR Code

Multicriteria-Based Computer-Aided Pronunciation Quality Evaluation of Sentences

  • Received : 2012.01.07
  • Accepted : 2012.07.16
  • Published : 2013.02.01

Abstract

The problem of the sentence-based pronunciation evaluation task is defined in the context of subjective criteria. Three subjective criteria (that is, the minimum subjective word score, the mean subjective word score, and first impression) are proposed and modeled with the combination of word-based assessment. Then, the subjective criteria are approximated with objective sentence pronunciation scores obtained with the combination of word-based metrics. No a priori studies of common mistakes are required, and class-based language models are used to incorporate incorrect and correct pronunciations. Incorrect pronunciations are automatically incorporated by making use of a competitive lexicon and the phonetic rules of students' mother and target languages. This procedure is applicable to any second language learning context, and subjective-objective sentence score correlations greater than or equal to 0.5 can be achieved when the proposed sentence-based pronunciation criteria are approximated with combinations of word-based scores. Finally, the subjective-objective sentence score correlations reported here are very comparable with those published elsewhere resulting from methods that require a priori studies of pronunciation errors.

Keywords

References

  1. L. Neumeyer et al., "Automatic Text-Independent Pronunciation Scoring of Foreign Language Student Speech," Proc. ICSLP, 1996, pp. 1457-1460.
  2. H. Franco et al., "Automatic Pronunciation Scoring for Language Instruction," ICASSP, vol. 2, 1997, pp. 1471-1474.
  3. A. Neri, C. Cucchiarini, and W. Strik, "Automatic Speech Recognition for Second Language Learning: How and Why It Actually Works," Proc. 15th Int. Congress Phonetic Sci., Barcelona, Spain, 2003, pp. 1157-1160.
  4. S. Nakagawa and K. Ohta, "A Statistical Method of Evaluating Pronunciation Proficiency for Presentation in English," Proc. InterSpeech, Antwerp, Belgium, Aug. 2007.
  5. J. Tepperman et al., "A Bayesian Network Classifier for Word-Level Reading Assessment," Proc. InterSpeech, Antwerp, Belgium, Aug. 2007.
  6. C. Molina et al., "ASR Based Pronunciation Evaluation with Automatically Generated Competing Vocabulary and Classifier Fusion," Speech Commun., vol. 51, no. 6, June 2009, pp. 485-498. https://doi.org/10.1016/j.specom.2009.01.002
  7. O. Deshmukh, S. Joshi, and A. Verma, "Automatic Pronunciation Evaluation and Classification," INTERSPEECH, 2008, pp. 1721-1724.
  8. T. Cincarek et al., "Automatic Pronunciation Scoring of Words and Sentences Independent from the Non-native's First Language," Computer Speech Language, vol. 23, no. 1, Jan. 2009, pp. 65-88. https://doi.org/10.1016/j.csl.2008.03.001
  9. S. Xu et al., "Automatic Pronunciation Evaluation Based on Feature Extraction and Combination," Proc. 3rd Int. Conf. Innovative Computing Inf. Control, 2008, pp. 172-176.
  10. N. Moustroufas and V. Digalakis, "Automatic Pronunciation Evaluation of Foreign Speakers Using Unknown Text," Computer Speech Language, vol. 21, no. 1, Jan. 2007, pp. 219-230. https://doi.org/10.1016/j.csl.2006.04.001
  11. L. Neumeyer et al., "Automatic Scoring of Pronunciation Quality," Speech Commun., vol. 30, no. 2-3, Feb. 2000, pp. 83-93. https://doi.org/10.1016/S0167-6393(99)00046-1
  12. H. Franco et al., "Combination of Machine Scores for Automatic Grading of Pronunciation Quality," Speech Commun., vol. 30, no. 2-3, Feb. 2000, pp. 121-130. https://doi.org/10.1016/S0167-6393(99)00045-X
  13. S. Wei et al., "Pronunciation Space Models for Pronunciation Evaluation," 6th Int. Symp. Chinese Spoken Language Process. (ISCSLP), Dec. 2008, pp. 1-4.
  14. W. Ward and S. Issar, "A Class Based Language Model for Speech Recognition," Proc. ICASSP, 1996, pp. 416-418.
  15. J. Zhang et al., "Improvements in Audio Processing and Language Modeling in the CU Communicator," Eurospeech, Aalborg, Denmark, 2001.
  16. K.Y. Kwan, T. Lee, and C. Yang, "Unsupervised N-Best Based Model Adaptation Using Model-Level Confidence Measures," Proc. ICSLP, 2002, pp. 69-72.
  17. J. Sooful and E. Botha, "Comparison of Acoustic Distance Measures for Automatic Cross-Language Phoneme Mapping," Proc. ICSLP, Denver, CO, USA, 2002, pp. 521-524.
  18. J. Kittler et al., "On Combining Classifiers," IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, Mar. 1998, pp. 226-239. https://doi.org/10.1109/34.667881
  19. L.I. Kuncheva, J.C. Bezdeck, and R.P.W. Duin, "Decision Templates for Multiple Classifier Fusion: An Experimental Comparison," Pattern Recog., vol. 34, no. 2, 2001, pp. 299-314. https://doi.org/10.1016/S0031-3203(99)00223-X
  20. L.I. Kuncheva, "A Theoretical Study on Six Classifier Fusion Strategies," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, Feb. 2002, pp. 281-286. https://doi.org/10.1109/34.982906
  21. J. Kittler and F.M. Alkoot, "Sum versus Vote Fusion in Multiple Classifier Systems," IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, issue 1, 2003, pp. 110-115. https://doi.org/10.1109/TPAMI.2003.1159950
  22. G. Fumera and F. Roli, "A Theoretical and Experimental Analysis of Linear Combiners for Multiple Classifier Systems," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, June 2005, pp. 942-956. https://doi.org/10.1109/TPAMI.2005.109
  23. J. Garofalo et al., Continuous Speech Recognition (CSR-I) Wall Street Journal (WSJ0) News, Complete, Linguistic Data Consortium, Philadelphia, PA, USA, 1993.
  24. Linguistic Data Consortium, LATINO-40 Spanish Read News Corpus, database, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA, 1995.