DOI QR코드

DOI QR Code

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy

웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별

  • 구본응 (경기대학교 전자공학과)
  • Received : 2013.12.06
  • Accepted : 2014.01.24
  • Published : 2014.03.31

Abstract

In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.

본 논문에서는 WPD (Wavelet Packet Decomposition) 계수에 Teager 에너지를 적용한 특징 계수를 임계값 알고리듬에 적용하여 잡음에 강인한 VAD 알고리듬을 제안하였다. 임계값은 비음성 구간의 평균과 표준편차를 추산하여 설정하였다. TIMIT 음성과 NOISEX 잡음 데이터베이스를 사용한 실험 결과, 제안된 알고리듬이 기존의 대표적인 비교 대상 알고리듬보다 우수함을 보였다. 정확도는 SNR 10 dB부터 -10 dB까지 ROC (Receiver Operating Characteristics) 곡선을 사용하여 비교하였다.

Keywords

References

  1. P. C. Loizou, Speech Enhancement (CRC Press, Boca Raton, 2007), pp. 309-400.
  2. K. Ishizuka, T. Nakatani and N. Miyazaki, "Noise robust voice activity detection based on periodic to aperiodic component ratio," Speech Commun.52, 41-60 (2010). https://doi.org/10.1016/j.specom.2009.08.003
  3. D. Ying, Y. Yan, J. Dang and F. K. Soong, "Voice activity detection based on an unsupervised learning network," IEEE Trans. Audio, Speech, and Lang. Processing, 19, 2624-2628 (2011). https://doi.org/10.1109/TASL.2011.2125953
  4. T. Kristjansson, S. Deligne and P. Olsen, "Voicing features for speech detection," in Proc. Interspeech, 369-372 (2005).
  5. J-H Bach, B. Kollmeier and J. Anemuller, "Modulationbased detection of speech in real background noise: Generalization to novel background classes," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 41-44 (2010).
  6. E. Chuangsuwanich and J. Glass, "Robust voice activity detector for real world application using harmonicity and modulation frequency," in Proc. Interspeech, 2645-2648 (2011).
  7. J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett. 16, 1-3 (1999).
  8. F. Beritelli, S. Casale and G. Ruggeri, "Performance evaluation and comparison of ITU-T/ETSI voice activity detectors," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 3, 1425-1428 (2001).
  9. M. Marzinzik and B. Kollmeier, "Speech pause detection for noise spectrum estimation by tracking power envelope dynamics," IEEE Trans. Speech and Audio Process. 10, 109-118 (2002) https://doi.org/10.1109/89.985548
  10. J. Ramirez, J. C. Segura, C. Benitez, A, Torre and A. Rubio, "Efficient voice activity detection algorithms using long-term speech information," Speech Commun. 42, 271-287 (2004). https://doi.org/10.1016/j.specom.2003.10.002
  11. A. Davis, S. Nordholm and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Audio, Speech, and Lang. Processing, 14, 412-414 (2006). https://doi.org/10.1109/TSA.2005.855842
  12. G. Evangelopoulos and P. Maragos, "Multiband modulation energy tracking for noisy speech detection," IEEE Trans. Audio, Speech and Lang. Processing, 14, 2024-2038 (2006). https://doi.org/10.1109/TASL.2006.872625
  13. T. V. Pham and T. T. Chien, "Reliable voice activity detection algorithm under adverse environments," in Proc. IEEE Int. Conf. Commun. Electronics, 218-223 (2008).
  14. P. K. Ghosh and S. Narayanan, "Robust voice activity detection using long-term signal variability," IEEE Trans. Audio, Speech and Lang. Processing, 19, 600-613 (2011). https://doi.org/10.1109/TASL.2010.2052803
  15. James F. Kaiser, "On a simple algorithm to calculate the 'energy' of a signal," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. S7.3, 381-384 (1990).
  16. F. Jabloun, A. E. Cetin and E. Erzin, "Teager energy based feature parameters for speech recognition in car noises," IEEE Signal Process. Lett.. 6, 259-261 (1999). https://doi.org/10.1109/97.789604
  17. M. Bahoura and J. Rouat, "Wavelet speech enhancement based on the Teager energy operator," IEEE Signal Process. Lett. 8, 10-12 (2001). https://doi.org/10.1109/97.889636
  18. K. B. Eung, "An Experimental Study on the Robustness of the Teager Energy to the Car Noise," (in Korean), Inst. of Industrial Technology Journal, Kyonggi University, 39, 43-56 (2011).
  19. ETSI EN 301 708 V7.1.1(1999-12), Digital cellular telecommunications system(Phase 2+); VAD for AMR speech traffic channels; General Description (GSM 06.94 version 7.1.1 Release 1998), 13-14 (1999).
  20. ETSI ES 202 050, Ver. 1.1.5(2007-01), Speech Processing, Transmission and Quality Aspects(STQ); Distributed Speech Recognition; Advanced front-end feature extraction algorithm; Compression algorithms, Annex A.3 Stage 2-VAD Logic, 42-43 (2007).
  21. J. S. Garofolo, "TIMIT acoustic-phonetic continuous speech corpus," Linguistic Data Consortium, Philadelphia, (1993).
  22. A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: An additive noise on speech recognition systems," Speech Commun.12, 247-251 (1993). https://doi.org/10.1016/0167-6393(93)90095-3

Cited by

  1. A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises vol.34, pp.4, 2015, https://doi.org/10.7776/ASK.2015.34.4.310