DOI QR코드

DOI QR Code

확률적 비음수 행렬 인수분해를 사용한 통계적 음성검출기법

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization

  • Kim, Dong Kook (Chonnam National University, School of Electronic and Computer Engineering) ;
  • Shin, Jong Won (Gwangju Institute of Science and Technology, School of Electrical Engineering and Computer Science) ;
  • Kwon, Kisoo (Seoul National University, Department of Electrical and Computer Engineering and the Institute of New Media and Communications) ;
  • Kim, Nam Soo (Seoul National University, Department of Electrical and Computer Engineering and the Institute of New Media and Communications)
  • 투고 : 2016.05.12
  • 심사 : 2016.07.12
  • 발행 : 2016.08.31

초록

본 논문은 비음수 행렬 인수분해(NMF)의 확률적 해석에 근거한 새로운 통계적 음성검출기법을 제안한다. NMF의 기저와 부호화 행렬들이 주어졌을 때, 데이터 행렬의 분포를 Poisson 분포로 가정한 로그 우도는 Kullback-Leibler 발산을 이용한 NMF의 목적 함수와 일치한다. 이러한 NMF의 확률모델에 근거하여 음성검출을 위해 DFT영역에서 잡음과 음성의 크기 스펙트럼을 Poisson 분포로 모델링하여 새로운 우도비 검출 규칙을 유도한다. 실험 결과를 통해 제안된 기법이 0-15dB 신호 대 잡음비의 시뮬레이션 환경에서 기존 Gaussian과 NMF을 사용한 기법보다 향상된 음성검출 결과를 보여준다.

This paper presents a new statistical voice activity detection (VAD) based on the probabilistic interpretation of nonnegative matrix factorization (NMF). The objective function of the NMF using Kullback-Leibler divergence coincides with the negative log likelihood function of the data if the distribution of the data given the basis and encoding matrices is modeled as Poisson distributions. Based on this probabilistic NMF, the VAD is constructed using the likelihood ratio test assuming that speech and noise follow Poisson distributions. Experimental results show that the proposed approach outperformed the conventional Gaussian model-based and NMF-based methods at 0-15 dB signal-to-noise ratio simulation conditions.

키워드

참고문헌

  1. J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999. https://doi.org/10.1109/97.736233
  2. J. -H. Chang, N. S. Kim, and S. K. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Sign. Process., vol. 54, no. 6, pp. 1965-1976, Jun. 2006. https://doi.org/10.1109/TSP.2006.874403
  3. Q. -H. Jo, J. -H. Chang, J. Shin, and N. S. Kim, "Statistical model-based voice activity detection using support vector machine," IET Sign. Process., vol. 3, no. 3, pp. 205-210, May 2009. https://doi.org/10.1049/iet-spr.2008.0128
  4. L. Zhang and J. Wu, "Deep belief networks based voice activity detection," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 4, pp. 3371-3408, Apr. 2013.
  5. D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, pp. 788-791, Oct. 1999. https://doi.org/10.1038/44565
  6. S. -I. Kang and J. -H. Chang, "Voice activity detection based on non-negative matrix factorization," J. KICS, vol. 35, no. 8, pp. 661-666, 2010.
  7. F. G. Germain, D. L. Sun, and G. J. Mysore, "Speaker and noise independent voice activity detection," Interspeech, pp. 732-736, Aug. 2013.
  8. A. T. Cemgil, "Bayesian inference for nonnegative matrix factorisation models," Computational Intelligence and Neuroscience, vol. 2009, no. 785152, p. 17, 2009.
  9. T. Virtanen, A. T. Cemgil, and S. J. Godsill. "Bayesian extensions to non-negative matrix factorisation for audio signal modelling," in Proc. IEEE Int. Conf. Acoust. Speech and Sign. Process. 2008, pp. 1825-1828, Las Vegas, Apr. 2008.
  10. N. Mohammadiha, T. Gerkmann, and A. Leijon, "A new linear MMSE filter for single channel speech enhancement based on nonnegative matrix factorization," IEEE WASPAA, pp. 45-48, 2011.
  11. K. Kwon, Y. G. Jin, S. H. Bae, and N. S. Kim, "A NMF-based speech enhancement method using a prior time varying information and gain function," J. KICS, vol. 38C, no. 6, pp. 503-511, 2013.
  12. ETSI EN 301708-1999: Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, v7.1.1 (European Telecommunications Standards Institute, France, 1999).