Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

Kumar, Sandeep;

doi:10.4218/etrij.2019-0364

ETRI Journal

Volume 43 Issue 1
/
Pages.82-94
/
2021
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

Kumar, Sandeep (Department of Electronics and Communication Engineering, National Institute of Technology)

Received : 2019.07.24
Accepted : 2020.03.24
Published : 2021.02.01

https://doi.org/10.4218/etrij.2019-0364 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this work, six voiced/unvoiced speech classifiers based on the autocorrelation function (ACF), average magnitude difference function (AMDF), cepstrum, weighted ACF (WACF), zero crossing rate and energy of the signal (ZCR-E), and neural networks (NNs) have been simulated and implemented in real time using the TMS320C6713 DSP starter kit. These speech classifiers have been integrated into a linear-predictive-coding-based speech analysis-synthesis system and their performance has been compared in terms of the percentage of the voiced/unvoiced classification accuracy, speech quality, and computation time. The results of the percentage of the voiced/unvoiced classification accuracy and speech quality show that the NN-based speech classifier performs better than the ACF-, AMDF-, cepstrum-, WACF- and ZCR-E-based speech classifiers for both clean and noisy environments. The computation time results show that the AMDF-based speech classifier is computationally simple, and thus its computation time is less than that of other speech classifiers, while that of the NN-based speech classifier is greater compared with other classifiers.

Keywords

References

S. Ahmadi and A. Spanisa, Cepstrum-based pitch detection using a new statistical V/UV classification algorithm, IEEE Trans. Speech, Audio Process. 7 (1999), no. 3, 333-338. https://doi.org/10.1109/89.759042
A. Mousa, Speech segmentation in synthesized speech morphing using pitch shifting, Int. Arab J. Inf. Technol. 8 (2011), no. 2, 221-226.
S. Kumar, S. K. Singh, and S. Bhattacharya, Performance evaluation of a ACF-AMDF based pitch detection scheme in real time, Int. J. Speech Technol. 18 (2015), no. 4, 521-527. https://doi.org/10.1007/s10772-015-9296-2
Y. Faycal and M. Bensebti, Comparative performance study of several features for voiced/ Non-voiced classification, Int. Arab J. Inf. Technol. 11 (2014), no. 3, 293-299.
R. G. Bachu et al., Voiced/Unvoiced decision for speech signals based on zero-crossing rate and energy, Advanced Techniques in Computing Sciences and Software Engineering, K. Elleithy (eds), Springer, Dordrecht, Netherlands, 2010, pp. 279-282.
L. Janer, J. J. Bonet, and E. L. Solano, Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms, in Proc. Int. Conf. Spoken Language Process. (Philadelphia, PA, USA), Oct. 1996, pp. 1209-1212.
S. Kumar et al., Performance evaluation of a wavelet-based pitch detection scheme, Int. J. Speech Technol. 16 (2013), no. 4, 431-417. https://doi.org/10.1007/s10772-013-9194-4
K. M. Hassan, E. Hamid, and K. I. Molla, A method for voiced/unvoiced classification of noisy speech by analyzing time-domain features of spectrogram image, Sci. J. Circuits, Syst. Signal Process. 6 (2017), no. 2, 11-17. https://doi.org/10.11648/j.cssp.20170602.12
B. S. Atal and L. R. Rabiner, A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Acoust., Speech, Signal Process. 24 (1976), no. 3, 201-212. https://doi.org/10.1109/TASSP.1976.1162800
J. K. Shah et al., Robust voiced/unvoiced classification using novel features and Gaussian mixture model, 2004, available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.618.2362&rep=rep1&type=pdf
Y. Qi and B. R. Hunt, Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier, IEEE Trans, Speech, Audio Process. 1 (1993), no. 2, 250-255. https://doi.org/10.1109/89.222883
T. Drugman et al., Traditional machine learning for pitch detection, IEEE Signal Process. Lett. 25 (2018), no. 11, 1745-1749. https://doi.org/10.1109/lsp.2018.2874155
S. Bagavathi and S. I. Padma, Neural network based voiced and unvoiced classification using EGG and MFCC feature, Int. Research J. Eng. Technol. 4 (2017), no. 4, 1934-1937.
A. Bendiksen and K. Steiglitz, Neural networks for voiced/unvoiced speech classification, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process. (Albuquerque, NM, USA), Apr. 1990, pp. 521-524.
B. H. Juang and L. R. Rabiner, Spectral representations for speech recognition by neural networks-A tutorial, in Proc. Neural Netw. Signal Process. II Proc. IEEE Workshop (Helsingoer, Denmark), 1992, pp. 214-222.
M. Sharma and R. Mammone, Automatic speech segmentation using neural tree networks, in Proc. IEEE Workshop Neural Netw. Signal Process. (Cambridge, MA, USA), 1995, pp. 282-290.
K. Khaldi, A. Boudraa, and M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement, IET Signal Process. 10 (2016), no. 1, 69-80. https://doi.org/10.1049/iet-spr.2013.0425
Z. Ali and M. Talha, Innovative method for unsupervised voice activity detection and classification of audio segments, IEEE Access 6 (2018), 15494-15504. https://doi.org/10.1109/access.2018.2805845
G. Sun et al., The complexity analysis of voiced and unvoiced speech signal based on sample entropy, in Proc. Int. Conf. Math. Comput. Sci. Industry (Corfu, Greece), Aug. 2017, pp. 26-29.
K. Struwe, Voiced-unvoiced classification of speech using a neural network trained with LPC coefficients, in Proc. Int. Conf. Contr., Artif. Intell., Robot. Opt. (Prague, Czech Republic), May 2017, pp. 56-59.
S. S. Park, J. W. Shin, and N. S. Kim, Automatic speech segmentation with multiple statistical models, in Proc. INTERSPEECH 2006 - ICSLP (Pittsburgh, PA, USA), 2017, pp. 2066-2069.
S. Bhattacharya, S. K. Singh, and T. Abhinav, Performance evaluation of lpc and cepstral speech coder in simulation and in real time, in Proc. Int. Conf. Recent Adv. Inf. Technol. (Dhanbad, India), Mar. 2012, pp. 826-831.
S. Kumar, Performance evaluation of a novel AMDF-based pitch detection scheme, ETRI J. 38 (2016), no. 3, 425-434. https://doi.org/10.4218/etrij.16.0115.0926
C. Yeh and C. Zhuo, An efficient complexity reduction algorithm for G.729 speech codec, Comput. Math. Applicat. 64 (2012), no. 5, 887-896. https://doi.org/10.1016/j.camwa.2012.01.048
G. Pirker et al., A pitch tracking corpus with evaluation on multipitch tracking scenario, in Proc. Interspeech - Int. Conf. Spoken Language Process. (Florence, Italy), 2011, pp. 1509-1512.
Y. Hu and P. Loizou, Subjective evaluation and comparision of speech enhancement algorithms, Speech Commun. 49 (2007), no. 7-8, 588-601. https://doi.org/10.1016/j.specom.2006.12.006
J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-time processing of speech signal, Wiley, Piscataway, NJ, USA, 2000, pp. 570-579.
ITU-T P.862, Perceptual evaluation of speech quality (PESQ), 2004.
S. Kumar, S. Bhattacharya, and P. Patel, A new pitch detection scheme based on ACF and AMDF, in Proc. IEEE Int. Conf. Adv. Commun., Contr. Comput. Technol. (Ramanathapuram, India), 2014, pp. 1235-1240.
S. Kadambe, G. F. Boudreaux-Bartels, Application of the wavelet transform for pitch detection of speech signals, IEEE Trans. Inf. Theory 38 (1992), no. 2, 917-924. https://doi.org/10.1109/18.119752

Cited by

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets vol.21, pp.5, 2021, https://doi.org/10.3390/s21051579

ETRI Journal

Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)