A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises

Koo, Boneung;

doi:10.7776/ASK.2015.34.4.310

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 34 Issue 4
/
Pages.310-315
/
2015
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises

비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지

Koo, Boneung (Department of Electronic Engineering, Kyonggi University)

구본응 (경기대학교 전자공학과)

Received : 2015.01.29
Accepted : 2015.04.07
Published : 2015.07.31

https://doi.org/10.7776/ASK.2015.34.4.310 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.

본 논문에서는 비정체성(nonstationary) 잡음 환경을 위한 단일 채널 VAD(Voice Activity Detection) 알고리듬 제안하였다. VAD 판별을 위한 특징계수의 임계값은 과거 비음성 프레임들의 평균과 표준편차를 추산하여 적응적으로 갱신하였다. 특징계수로는 SPD-TE(Spectral Power Difference-Teager Energy)를 사용했는데, 이것은 WPD(Wavelet Packet Decomposition) 계수에 Teager 에너지를 적용한 것으로서 잡음에 강인한 것으로 보고된 바 있다. TIMIT 음성과 NOISEX-92 잡음을 사용하여 10 dB부터 -10 dB까지의 SNR에 대한 실험 결과, 제안된 알고리듬이 표준을 포함한 기존의 알고리듬과 비슷한 정확도를 보였다.

Keywords

References

P. C. Loizou, Speech Enhancement (CRC Press, Boca Raton, 2007), pp. 309-400.
J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett. 16, 1-3 (1999).
ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70, ITU-T Recommendation G.729-Annex B (1996).
ETSI EN 301 708 V7.1.1(1999-12), Digital cellular telecommunications system(Phase 2+); VAD for AMR speech traffic channels; General Description (GSM 06.94 version 7.1.1 Release 1998), 13-14 (1999).
ETSI ES 202 050, Ver. 1.1.5(2007-01), Speech Processing, Transmission and Quality Aspects(STQ); Distributed Speech Recognition; Advanced front-end feature extraction algorithm; Compression algorithms, Annex A.3 Stage 2-VAD Logic, 42-43 (2007).
J. Ramirez, J. C. Segura, C. Benitez, A. Torre, and A. Rubio, "Efficient voice activity detection algorithms using longterm speech information," Speech Commun. 42, 271-287 (2004). https://doi.org/10.1016/j.specom.2003.10.002
A. Davis, S. Nordholm, and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Audio, Speech and Lang. Processing 14, 412-414 (2006). https://doi.org/10.1109/TSA.2005.855842
G. Evangelopoulos and P. Maragos, "Multiband modulation energy tracking for noisy speech detection," IEEE Trans. Audio, Speech and Lang. Processing 14, 2024-2038 (2006). https://doi.org/10.1109/TASL.2006.872625
T. V. Pham and T. T. Chien, "Reliable voice activity detection algorithm under adverse environments," in Proc. IEEE Int. Conf. Commun. Electronics, 218-223 (2008).
P. K. Ghosh and S. Narayanan, "Robust voice activity detection using long-term signal variability," IEEE Trans. Audio, Speech and Lang. Processing 19, 600-613 (2011). https://doi.org/10.1109/TASL.2010.2052803
E. Chuangsuwanich and J. Glass, "Robust voice activity detector for real world application using harmonicity and modulation frequency," in Proc. Interspeech, 2645-2648 (2011).
B. Koo, "A single channel voice activity detection for noisy environments using wavelet packet decomposition and Teager energy" (in Korean), J. Acoust. Soc. Kr. 33, 139-145 (2014). https://doi.org/10.7776/ASK.2014.33.2.139
J. Garofolo, "TIMIT acoustic-phonetic continuous speech corpus," LDC93S1, Linguistic Data Consortium, Philadelphia, 1993.
A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: An additive noise on speech recognition systems," Speech Commun. 12, 247-251 (1993). https://doi.org/10.1016/0167-6393(93)90095-3

The Journal of the Acoustical Society of Korea (한국음향학회지)

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises

비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)