DOI QR코드

DOI QR Code

Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

  • Kumar, Sandeep (Department of Electronics and Communication Engineering, National Institute of Technology)
  • Received : 2019.07.24
  • Accepted : 2020.03.24
  • Published : 2021.02.01

Abstract

In this work, six voiced/unvoiced speech classifiers based on the autocorrelation function (ACF), average magnitude difference function (AMDF), cepstrum, weighted ACF (WACF), zero crossing rate and energy of the signal (ZCR-E), and neural networks (NNs) have been simulated and implemented in real time using the TMS320C6713 DSP starter kit. These speech classifiers have been integrated into a linear-predictive-coding-based speech analysis-synthesis system and their performance has been compared in terms of the percentage of the voiced/unvoiced classification accuracy, speech quality, and computation time. The results of the percentage of the voiced/unvoiced classification accuracy and speech quality show that the NN-based speech classifier performs better than the ACF-, AMDF-, cepstrum-, WACF- and ZCR-E-based speech classifiers for both clean and noisy environments. The computation time results show that the AMDF-based speech classifier is computationally simple, and thus its computation time is less than that of other speech classifiers, while that of the NN-based speech classifier is greater compared with other classifiers.

Keywords

References

  1. S. Ahmadi and A. Spanisa, Cepstrum-based pitch detection using a new statistical V/UV classification algorithm, IEEE Trans. Speech, Audio Process. 7 (1999), no. 3, 333-338. https://doi.org/10.1109/89.759042
  2. A. Mousa, Speech segmentation in synthesized speech morphing using pitch shifting, Int. Arab J. Inf. Technol. 8 (2011), no. 2, 221-226.
  3. S. Kumar, S. K. Singh, and S. Bhattacharya, Performance evaluation of a ACF-AMDF based pitch detection scheme in real time, Int. J. Speech Technol. 18 (2015), no. 4, 521-527. https://doi.org/10.1007/s10772-015-9296-2
  4. Y. Faycal and M. Bensebti, Comparative performance study of several features for voiced/ Non-voiced classification, Int. Arab J. Inf. Technol. 11 (2014), no. 3, 293-299.
  5. R. G. Bachu et al., Voiced/Unvoiced decision for speech signals based on zero-crossing rate and energy, Advanced Techniques in Computing Sciences and Software Engineering, K. Elleithy (eds), Springer, Dordrecht, Netherlands, 2010, pp. 279-282.
  6. L. Janer, J. J. Bonet, and E. L. Solano, Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms, in Proc. Int. Conf. Spoken Language Process. (Philadelphia, PA, USA), Oct. 1996, pp. 1209-1212.
  7. S. Kumar et al., Performance evaluation of a wavelet-based pitch detection scheme, Int. J. Speech Technol. 16 (2013), no. 4, 431-417. https://doi.org/10.1007/s10772-013-9194-4
  8. K. M. Hassan, E. Hamid, and K. I. Molla, A method for voiced/unvoiced classification of noisy speech by analyzing time-domain features of spectrogram image, Sci. J. Circuits, Syst. Signal Process. 6 (2017), no. 2, 11-17. https://doi.org/10.11648/j.cssp.20170602.12
  9. B. S. Atal and L. R. Rabiner, A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Acoust., Speech, Signal Process. 24 (1976), no. 3, 201-212. https://doi.org/10.1109/TASSP.1976.1162800
  10. J. K. Shah et al., Robust voiced/unvoiced classification using novel features and Gaussian mixture model, 2004, available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.618.2362&rep=rep1&type=pdf
  11. Y. Qi and B. R. Hunt, Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier, IEEE Trans, Speech, Audio Process. 1 (1993), no. 2, 250-255. https://doi.org/10.1109/89.222883
  12. T. Drugman et al., Traditional machine learning for pitch detection, IEEE Signal Process. Lett. 25 (2018), no. 11, 1745-1749. https://doi.org/10.1109/lsp.2018.2874155
  13. S. Bagavathi and S. I. Padma, Neural network based voiced and unvoiced classification using EGG and MFCC feature, Int. Research J. Eng. Technol. 4 (2017), no. 4, 1934-1937.
  14. A. Bendiksen and K. Steiglitz, Neural networks for voiced/unvoiced speech classification, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process. (Albuquerque, NM, USA), Apr. 1990, pp. 521-524.
  15. B. H. Juang and L. R. Rabiner, Spectral representations for speech recognition by neural networks-A tutorial, in Proc. Neural Netw. Signal Process. II Proc. IEEE Workshop (Helsingoer, Denmark), 1992, pp. 214-222.
  16. M. Sharma and R. Mammone, Automatic speech segmentation using neural tree networks, in Proc. IEEE Workshop Neural Netw. Signal Process. (Cambridge, MA, USA), 1995, pp. 282-290.
  17. K. Khaldi, A. Boudraa, and M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement, IET Signal Process. 10 (2016), no. 1, 69-80. https://doi.org/10.1049/iet-spr.2013.0425
  18. Z. Ali and M. Talha, Innovative method for unsupervised voice activity detection and classification of audio segments, IEEE Access 6 (2018), 15494-15504. https://doi.org/10.1109/access.2018.2805845
  19. G. Sun et al., The complexity analysis of voiced and unvoiced speech signal based on sample entropy, in Proc. Int. Conf. Math. Comput. Sci. Industry (Corfu, Greece), Aug. 2017, pp. 26-29.
  20. K. Struwe, Voiced-unvoiced classification of speech using a neural network trained with LPC coefficients, in Proc. Int. Conf. Contr., Artif. Intell., Robot. Opt. (Prague, Czech Republic), May 2017, pp. 56-59.
  21. S. S. Park, J. W. Shin, and N. S. Kim, Automatic speech segmentation with multiple statistical models, in Proc. INTERSPEECH 2006 - ICSLP (Pittsburgh, PA, USA), 2017, pp. 2066-2069.
  22. S. Bhattacharya, S. K. Singh, and T. Abhinav, Performance evaluation of lpc and cepstral speech coder in simulation and in real time, in Proc. Int. Conf. Recent Adv. Inf. Technol. (Dhanbad, India), Mar. 2012, pp. 826-831.
  23. S. Kumar, Performance evaluation of a novel AMDF-based pitch detection scheme, ETRI J. 38 (2016), no. 3, 425-434. https://doi.org/10.4218/etrij.16.0115.0926
  24. C. Yeh and C. Zhuo, An efficient complexity reduction algorithm for G.729 speech codec, Comput. Math. Applicat. 64 (2012), no. 5, 887-896. https://doi.org/10.1016/j.camwa.2012.01.048
  25. G. Pirker et al., A pitch tracking corpus with evaluation on multipitch tracking scenario, in Proc. Interspeech - Int. Conf. Spoken Language Process. (Florence, Italy), 2011, pp. 1509-1512.
  26. Y. Hu and P. Loizou, Subjective evaluation and comparision of speech enhancement algorithms, Speech Commun. 49 (2007), no. 7-8, 588-601. https://doi.org/10.1016/j.specom.2006.12.006
  27. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-time processing of speech signal, Wiley, Piscataway, NJ, USA, 2000, pp. 570-579.
  28. ITU-T P.862, Perceptual evaluation of speech quality (PESQ), 2004.
  29. S. Kumar, S. Bhattacharya, and P. Patel, A new pitch detection scheme based on ACF and AMDF, in Proc. IEEE Int. Conf. Adv. Commun., Contr. Comput. Technol. (Ramanathapuram, India), 2014, pp. 1235-1240.
  30. S. Kadambe, G. F. Boudreaux-Bartels, Application of the wavelet transform for pitch detection of speech signals, IEEE Trans. Inf. Theory 38 (1992), no. 2, 917-924. https://doi.org/10.1109/18.119752

Cited by

  1. Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets vol.21, pp.5, 2021, https://doi.org/10.3390/s21051579