실시간 변별적 가중치 학습에 기반한 음성 검출기

Voice Activity Detection Based on Real-Time Discriminative Weight Training

  • 강상익 (인하대학교 전자공학부) ;
  • 조규행 (인하대학교 전자공학부) ;
  • 장준혁 (인하대학교 전자공학부)
  • Chang, Sang-Ick (Department of Electronics Engineering Inha University) ;
  • Jo, Q-Haing (Department of Electronics Engineering Inha University) ;
  • Chang, Joon-Hyuk (Department of Electronics Engineering Inha University)
  • 발행 : 2008.07.25

초록

본 논문에서는 다양한 잡음 환경에서 음성의 통계적 모델에 기반한 음성 검출기의 성능향상을 위해 PSFM (Power Spectral Flatness Measure)을 이용하여 실시간으로 변별적 가중치 학습 (Discriminative Weight Training) 기반의 최적화된 우도비 테스트 (Likelihood Ratio Test, LRT)를 제안한다. 먼저, 기존의 통계모델기반의 음성 검출기를 분석하고, 이를 기반으로 MCE (Minimum Classification Error)방법을 도입하여 도출한 각 주파수 채널별 가중치를 PSFM 값에 기반하여 실시간 매 프레임마다 다른 가중치를 적용한 우도비 기반의 음성 검출 결정법을 제시한다. 제안된 알고리즘은 다양한 잡음 환경에서 기존에 제시된 음성 검출기와 비교하였으며, 우수한 성능을 보인다.

In this paper we apply a discriminative weight training employing power spectral flatness measure (PSFM) to a statistical model-based voice activity detection (VAD) in various noise environments. In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratio test (LRT) based on a minimum classification error (MCE) method which is different from the previous works in th at different weights are assigned to each frequency bin and noise environments depending on PSFM. According to the experimental results, the proposed approach is found to be effective for the statistical model-based VAD using the LRT.

키워드

참고문헌

  1. L. R. Rabiner and M. R. Sambur, "Voiced- unvoiced-silence detection using Itakura LPC distance measure," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 323-326, May 1977
  2. J. D. Hoyt and H. Wechsler, "Detection of human speech in structured noise," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 237-240, May 1994
  3. J. C. Junqua, B. Reaves, and B. Mark, "A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize," in Proc. Eurospeech, pp. 1371-1374, 1991
  4. J. A. Haigh and J. S. Mason, "Robust voice activity detection using cepstral feature," in Proc. IEEE TELCON, pp. 321-324, China, 1993
  5. R. Tucker, "Voice activity detection using a periodicity measure," in Proc. Inst. Electr. Eng., vol. 139, pp. 377-380, Aug. 1992
  6. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoustics, Speech, Sig. Process., vol. ASSP-32, no. 6, pp. 1190-1121, Dec. 1984
  7. J. Sohn and W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. Int. Conf. Acoustics, Speech, and Sig. Process., vol. 1, pp. 365-368, May 1998
  8. J. Sohn, N. S. Kim, and W. Sung, "A statistical model- based voice activity detection," IEEE Sig. Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999
  9. Y. D. Cho and A. Kondoz, "Analysis and improvement of a statistical model-based voice activity detector," IEEE Sig. Process. Lett., vol. 8, no. 10, pp. 276-278, Oct. 2001 https://doi.org/10.1109/97.957270
  10. J. -H. Chang, J. W. Shin, and N. S. Kim, "Voice activity detector employing generalised gaussian distribution," Electron. Lett., vol. 40, no. 24, pp. 1561-1563, Nov. 2004 https://doi.org/10.1049/el:20047090
  11. J. -H. Chang, N. S. Kim, and S. K. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Sig. Process., vol. 54, no. 6, pp. 1965-1976, June 2006 https://doi.org/10.1109/TSP.2006.874403
  12. Y. C. Lee and S. S. Ahn, "Statistical model-based VAD algorithm with wavelet Transform," IEICE Trans. Fundamentals., vol. E89-A, no. 6, pp. 1594-1600, June 2006 https://doi.org/10.1093/ietfec/e89-a.6.1594
  13. J. Ramirez, J. M. Gorriz, J. C. Segura, C. G. Puntonet, and A. J. Rubio, "Speech/non-speech discrimination based on contextual information integrated bispectrum LRT," IEEE Sig. Process. Lett., vol. 13, no. 8, pp. 497-500, Aug. 2006 https://doi.org/10.1109/LSP.2006.873147
  14. B. -H. Juang, W. Chou, and C. -H. Lee, "Minimum classification error rate methods for speech recognition," IEEE Trans. Speech Audio Processing, vol. 5, no. 3, pp. 257-265, May 1997 https://doi.org/10.1109/89.568732
  15. Y. Kida and T. Kawahara, "Voice activity detection based on optimally weighted combination of muliple feature," in Proc. Interspeech, pp. 2621-2624, Sep. 2005
  16. J. -H. Chang, S. Gazor, N. S. Kim, and S. K. Mitra, "Multiple Statistical Models for Soft Decision in Noisy Speech Enhancement," Pattern Recognition, vol. 40, no. 3, pp. 1123-1134, Mar. 2007 https://doi.org/10.1016/j.patcog.2006.07.006