DOI QR코드

DOI QR Code

Vocal Effort Detection Based on Spectral Information Entropy Feature and Model Fusion

  • Chao, Hao (School of Computer Science and Technology, Henan Polytechnic University) ;
  • Lu, Bao-Yun (School of Computer Science and Technology, Henan Polytechnic University) ;
  • Liu, Yong-Li (School of Computer Science and Technology, Henan Polytechnic University) ;
  • Zhi, Hui-Lai (School of Computer Science and Technology, Henan Polytechnic University)
  • Received : 2017.03.03
  • Accepted : 2017.05.29
  • Published : 2018.02.28

Abstract

Vocal effort detection is important for both robust speech recognition and speaker recognition. In this paper, the spectral information entropy feature which contains more salient information regarding the vocal effort level is firstly proposed. Then, the model fusion method based on complementary model is presented to recognize vocal effort level. Experiments are conducted on isolated words test set, and the results show the spectral information entropy has the best performance among the three kinds of features. Meanwhile, the recognition accuracy of all vocal effort levels reaches 81.6%. Thus, potential of the proposed method is demonstrated.

Keywords

E1JBB0_2018_v14n1_218_f0001.png 이미지

Fig. 1. Sorted average cepstral distances among the 5 VE levels for all phonemes.

E1JBB0_2018_v14n1_218_f0002.png 이미지

Fig. 2. Comparison of average cepstral distances between MFCC and SIE for the five vowels.

Table 1. Six bands and their range of frequency domain

E1JBB0_2018_v14n1_218_t0001.png 이미지

Table 2. Two-stage VE detection results

E1JBB0_2018_v14n1_218_t0002.png 이미지

Table 3. The performance of combined models

E1JBB0_2018_v14n1_218_t0003.png 이미지

Table 4. The performance of complementary model

E1JBB0_2018_v14n1_218_t0004.png 이미지

References

  1. H. Traunmüller and A. Eriksson, "Acoustic effects of variation in vocal effort by men, women, and children," The Journal of the Acoustical Society of America, vol. 107, no. 6, pp. 3438-3451, 2000. https://doi.org/10.1121/1.429414
  2. P. Zelinka and M. Sigmund, "Automatic vocal effort detection for reliable speech recognition," in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing, Kittila, Finland, 2010, pp. 349-354.
  3. P. Zelinka, M. Sigmund, and J. Schimmel, "Impact of vocal effort variability on automatic speech recognition," Speech Communication, vol. 54, no. 6, pp. 732-742, 2012. https://doi.org/10.1016/j.specom.2012.01.002
  4. E. Shriberg, M. Graciarena, H. Bratt, A. Kathol, S. S. Kajarekar, H. Jameel, C. Richey, and F. Goodman, "Effects of vocal effort and speaking style on text-independent speaker verification," in Proceedings of 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, 2008, pp. 609-612.
  5. T. Raitio, A. Suni, J. Pohjalainen, M. Airaksinen, M. Vainio, and P. Alku, "Analysis and synthesis of shouted speech" in Proceedings of 14th Annual Conference of the International Speech Communication Association, Lyon, France, 2013, pp. 1544-1548.
  6. D. S. Brungart, K. R. Scott, and B. D. Simpson, "The influence of vocal effort on human speaker identification," in Proceedings of the 7th European Conference on Speech Communication and Technology, Aalborg, Denmark, 2001, pp. 747-750.
  7. R. Saeidi, P. Alku, and T. Backstrom, "Feature extraction using power-law adjusted linear prediction with application to speaker recognition under severe vocal effort mismatch," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 1, pp. 42-53, 2016. https://doi.org/10.1109/TASLP.2015.2493366
  8. S. T. Jovicic and Z. Saric, "Acoustic analysis of consonants in whispered speech," Journal of Voice, vol. 22, no. 3, pp. 263-274, 2008. https://doi.org/10.1016/j.jvoice.2006.08.012
  9. S. Ghaffarzadegan, H. Boril, and J. H. Hansen, "UT-VOCAL EFFORT II: analysis and constrained-lexicon recognition of whispered speech," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 2014, pp. 2544-2548.
  10. S. J. Wenndt, E. J. Cupples, and R. M. Floyd, "A study on the classification of whispered and normally phonated speech," in Proceedings of 7th International Conference on Spoken Language Processing, Denver, CO, 2002, pp. 649-652.
  11. C. Zhang and J. H. Hansen, "Advancements in whisper-island detection within normally phonated audio streams," in Proceedings of 10th Annual Conference of the International Speech Communication Association, Brighton, UK, 2009, pp. 860-863.
  12. M. A. Carlin, B. Y. Smolenski, and S. J. Wenndt, "Unsupervised detection of whispered speech in the presence of normal phonation," in Proceedings of 9th International Conference on Spoken Language Processing, Pittsburgh, PA, 2006, pp. 1-4.
  13. M. Sarria-Paja and T. H. Falk, "Whispered speech detection in noise using auditory-inspired modulation spectrum features," IEEE Signal Processing Letters, vol. 20, no. 8, pp. 783-786, 2013. https://doi.org/10.1109/LSP.2013.2266860
  14. C. Zhang and J. H. Hansen, "Analysis and classification of speech mode: whispered through shouted," in Proceedings of 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, 2007, pp. 2289-2292.
  15. H. Chao, C. Song, and Z. Z. Liu, "Multi-level detection of vocal effort based on vowel template matching," Journal of Beijing University of Posts and Telecommunications, vol. 39, no. 4, pp. 98-102, 2016.
  16. C. C. Chang and C. J. Lin, "LibSVM: a Library for Support Vector Machines," 2016 [Online]. Available: http://www.csie.ntu.edu.tw/-cjlin/libsvm/.