DOI QR코드

DOI QR Code

Voice Activity Detection Based on Discriminative Weight Training with Feedback

궤환구조를 가지는 변별적 가중치 학습에 기반한 음성검출기

  • 강상익 (인하대학교 전자공학부) ;
  • 장준혁 (인하대학교 전자공학부)
  • Published : 2008.11.30

Abstract

One of the key issues in practical speech processing is to achieve robust Voice Activity Deteciton (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ equally weighted likelihood ratios (LRs), which, however, deviates from the real observation. Furthermore voice activities in the adjacent frames have strong correlation. In other words, the current frame is highly correlated with previous frame. In this paper, we propose the effective VAD approach based on a minimum classification error (MCE) method which is different from the previous works in that different weights are assigned to both the likelihood ratio on the current frame and the decision statistics of the previous frame.

이동통신에서 배경잡음이 존재하는 실제 환경에서 음성신호처리의 가장 중요한 이슈중의 하나는 강인한 음성검출기를 설계하는 것이다. 상대적으로 간단하면서도 성능이 우수하여 대표적인 음성검출기로 사용되는 통계적모델기반 기법은 각 주파수 채널별 우도비를 이용하여 음성검출 검출식을 만들어내는 방식이다. 최근, 변별적 가중치 학습 (discriminative weight training)을 이용하여 주파수 체널별 가중치가 인가된 우도비를 이용한 음성검출 결정식을 갖는 음성검출기가 제안 되었으며 상대적으로 우수한 성능을 보였다. 본 연구에서는 기존의 변별적 가중치 학습의 입력벡터에 이전프레임의 결정식을 궤환구조형태를 바탕으로 추가하는 새로운 방식을 제안한다. 제안된 기법은 비정상 (non-staionary) 잡음 환경에서 객관적인 방법을 통해 상호비교 분석되었으며 결론적으로 우수한 성능을 보였다.

Keywords

References

  1. L. R. Rabiner and M. R. Sambur, "Voiced-unvoiced- silence detection using Itkura LPC distance measure," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 323-326, May 1977
  2. J. D. Hoyt and H. Wechsler, "Detection of human speech in structured noise," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 237-240, May 1994
  3. J. C. Junqua, B. Reaves, and B. Mark, "A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize," Proc. Eurospeech, 1371-1374, 1991
  4. J. A. Haigh and J. S. Mason, "Robust voice activity detection using cepstral feature," Proc. IEEE TELCON, 321-324, China, 1993
  5. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoustics, Speech, Sig. Process., ASSP-32(6), 1190-1121, Dec. 1984
  6. J. Sohn and W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," Proc. Int. Conf. Acoustics, Speech, and Sig. Process., 1, 365-368, May 1998
  7. J. Sohn, N. S. Kim, and W. Sung, "A statistical modelbased voice activity detection," IEEE Sig. Process. Lett., 6(1), 1-3, Jan. 1999
  8. Y. D. Cho and A. Kondoz, "Analysis and improvement of a statistical model-based voice activity detector," IEEE Sig. Process. Lett., 8(10), 276-278, Oct. 2001 https://doi.org/10.1109/97.957270
  9. J. -H. Chang, J. W. Shin, and N. S. Kim, "Voice activity detector employing generalised gaussian distribution," Electron. Lett., 40(24), 1561-1563, Nov. 2004 https://doi.org/10.1049/el:20047090
  10. J. -H. Chang, N. S. Kim, and S. K. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Sig. Process., 54(6), 1965-1976, June 2006 https://doi.org/10.1109/TSP.2006.874403
  11. Y. C. Lee and S. S. Ahn, "Statistical model-based VAD algorithm with wavelet Transform," IEICE Trans. Fundamentals., E89-A(6), 1594-1600, June 2006 https://doi.org/10.1093/ietfec/e89-a.6.1594
  12. J. Ramirez, J. M. Gorriz, J. C. Segura, C. G. Puntonet, and A. J. Rubio, "Speech/non-speech discrimination based on contextual information integrated bispectrum LRT," IEEE Sig. Process. Lett., 13(8), 497-500, Aug. 2006 https://doi.org/10.1109/LSP.2006.873147
  13. 강상익, 조규행, 장준혁, 박승섭, "통계적 모델 기반의 음성 검출기를 위한 변별적 가중치 학습," 한국음향학회지, 26(5), 173-181. 2007
  14. B. -H. Juang, W. Chou, and C. -H. Lee, "Minimum classification error rate methods for speech recognition," IEEE Trans. Speech Audio Processing, 5(3), 257-265, May 1997 https://doi.org/10.1109/89.568732
  15. Y. Kida, T. Kawahara, "Voice activity detection based on optimally weighted combination of multiple feature," Proc. Interspeech, 2621-2624, Sep. 2005