DOI QR코드

DOI QR Code

잡음 데이터를 활용한 음성 기저 행렬과 NMF 기반 음성 향상 기법

Speech Basis Matrix Using Noise Data and NMF-Based Speech Enhancement Scheme

  • 권기수 (서울대학교 전기.정보공학부 및 뉴미디어통신공동연구소) ;
  • 김형용 (서울대학교 전기.정보공학부 및 뉴미디어통신공동연구소) ;
  • 김남수 (서울대학교 전기.정보공학부 및 뉴미디어통신공동연구소)
  • Kwon, Kisoo (Department of Electrical and Computer Engineering and the Institute of New Media and Communications, Seoul National University) ;
  • Kim, Hyung Young (Department of Electrical and Computer Engineering and the Institute of New Media and Communications, Seoul National University) ;
  • Kim, Nam Soo (Department of Electrical and Computer Engineering and the Institute of New Media and Communications, Seoul National University)
  • 투고 : 2014.12.04
  • 심사 : 2015.04.15
  • 발행 : 2015.04.30

초록

본 논문은 비음수 행렬 인수분해(NMF)를 이용한 음성향상 기법을 다루고 있다. 음성과 잡음에서 적절한 훈련을 통해 각각의 기저(basis) 행렬을 구하고 이 행렬들을 이용하여 두 음원을 분리 하는 것이다. 그 중에서도 음성향상의 성능은 사용하게 되는 기저 행렬에 따라 크게 달라짐을 보인다. 기존의 독립적으로 구한 음성 기저 행렬에 비해서, 잡음 데이터를 복원하는데 부적합한 방향으로 최적화시킨 음성 기저 행렬을 사용하였을 때 더 높은 음성향상 성능을 보임을 실험으로 확인하였다. 이 때 잡음 데이터의 복원 오차 자체를 크게 해주는 방향과 해당 인코딩 행렬(encoding matrix) 원소의 값을 작게 해주는 두 가지 방법을 적용하여 비교하였다. 좀 더 음성 복원에만 특화된 기저 행렬을 구함으로서 음성 기저 행렬이 잡음 데이터 복원에 사용되는 것을 최소화 하였다. 실험 결과에서는 perceptual evaluation speech quality값과 signal to distortion ratio를 지표로 사용하였고, 기존 기법에서 사용하는 기저 행렬 보다 더 높은 성능을 보임을 확인 하였다.

This paper presents a speech enhancement method using non-negative matrix factorization (NMF). In the training phase, each basis matrix of source signal is obtained from a proper database, and these basis matrices are utilized for the source separation. In this case, the performance of speech enhancement relies heavily on the basis matrix. The proposed method for which speech basis matrix is made a high reconstruction error for noise signal shows a better performance than the standard NMF which basis matrix is trained independently. For comparison, we propose another method, and evaluate one of previous method. In the experiment result, the performance is evaluated by perceptual evaluation speech quality and signal to distortion ratio, and the proposed method outperformed the other methods.

키워드

참고문헌

  1. G. Huang, J. Benesty, T. Long, and J. Chen, "A family of maximum SNR filters for noise reduction," IEEE/ACM Trans. Audio, Speech, and Language Process., vol. 22, no. 12, Dec. 2014.
  2. I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments," Signal Process., vol. 81, pp. 2403-2418, 2001. https://doi.org/10.1016/S0165-1684(01)00128-1
  3. N. S. Kim and J.-H. Chang, "Spectral enhancement based on global soft decision," IEEE Signal Process. Lett., vol. 7, no. 5, pp. 108-110, May 2000. https://doi.org/10.1109/97.841154
  4. K. W. Wilson, B. Raj, P. Smaragdis, and A. Divakaran, "Speech denoising using nonnegative matrix factorization with priors," in IEEE Int. Conf. Acoustics, Speech and Signal Process., 2008.
  5. N. Mohammadiha, T. Gerkmann, and A. Leijon, "A new linear MMSE filter for single channel speech enhancement based on nonnegative matrix factorization," IEEE WASPAA, pp. 45-48, 2011.
  6. M. N. Schmidt, J. Larsen, and F. T. Hsiao, "Wind noise reduction using non-negative sparse coding," 2007 IEEE Workshop Machine Learning for Signal Process., pp. 431-436, 2007.
  7. K. Kwon, J. W. Shin, S. Sukanya, I. Choi, and N. S. Kim, "Speech enhancement combining statistical models and NMF with update of speech and noise bases," IEEE ICASSP, vol. 21, no. 10, pp. 7103-7107, 2014.
  8. N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and unsupervised speech enhancement using nonnegative matrix factorization," IEEE Trans. Audio, Speech, and Language Process., vol. 21, no. 10, pp. 2140-2151, 2013. https://doi.org/10.1109/TASL.2013.2270369
  9. D. D. Lee and H. S. Seung, "Learning the parts of objects by nonnegative matrix factorization," Nature, 1999.
  10. M. Julien, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Supervised dictionary learning," Advances in Neural Inf. Process. Syst., 2009.
  11. C. Ding, T. Li, W. Peng, and H. Park, "Orthogonal nonnegative matrix t-factorizations for clustering," in Proc. 12th ACM SIGKDD Int. Conf. Knowledge Discovery and Data mining. ACM, pp. 126-135, 2006.
  12. P. D. O'Grady and B. A. Pearlmutter, "Convolutive non-negative matrix factorisation with a sparseness constraint," in Proc. 16th IEEE Signal Process. Soc. Workshop on Machine Learning for Signal Process., pp. 427-432, Arlington, VA, Sept. 2006.
  13. J. Huang and T. Zhang, "The benefit of group sparsity," Annal. Statistics, vol. 38, no. 4, pp. 1978-2004, 2010. https://doi.org/10.1214/09-AOS778
  14. N. Guan, D. Tao, Z. Luo, and B. Yuan, "Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent," IEEE Trans. Image Process., vol. 20, no. 7, Jul. 2011.
  15. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, Tech. Rep. ITU-T P.862, 2001.
  16. E. Vincent, R. Gribonval, and C. Fvotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, and Language Process., vol. 14, no. 4, pp. 1462-1469, 2006. https://doi.org/10.1109/TSA.2005.858005
  17. A. Pascual-Montano, J. M. Carazo, K. Kochi, D. Lehmann, and R. D. Pascual-Marquil, "Nonsmooth nonnegative matrix factorization (nsNMF)," IEEE Trans. Pattern Anal. and Machine Intell., vol. 28, no. 3, pp. 403-415, 2006. https://doi.org/10.1109/TPAMI.2006.60
  18. L. Chin-Jen, "Projected gradient methods for non-negative matrix factorization," Neural Computation, vol. 19, no. 10, pp. 2756-2779, Oct. 2007. https://doi.org/10.1162/neco.2007.19.10.2756
  19. K. Kwon, Y. G. Jin, S. H. Bae, and N. S. Kim, "A NMF-based speech enhancement method using a prior time varying information and gain function," J. KICS, vol. 38, no. 6, pp. 503-511, Jun. 2013.
  20. D. Wang and J Lim, "The unimportance of phase in speech enhancement," IEEE Trans. Audio, Speech, and Language Process., vol. 30, No. 4, pp. 679-681, Aug. 1982.
  21. H.-T. Fan, J.-w. Hung, X. Lu, S.-S. Wang, and Y. Tsao, "Speech enhancement using segmental nonnegative matrix factorization," IEEE ICASSP, pp. 4516-4520, May 2014.
  22. P.-S. Huang, M. Kim, M. H-Johnson, and P. Smaragdis, "Deep learning for monaural speech separation," IEEE ICASSP, pp. 3433-3437, May 2014.
  23. G. Bao, Y. Xu, and Z. Ye, "Learning a discriminative dictionary for single-channel speech separation," IEEE Trans. Audio, Speech, and Language Process., vol. 22, no. 7, Jul. 2014.
  24. E. M. Grais and H. Erdogan, "Discriminative nonnegative dictionary learning using cross coherence penalties for single channel source separation," INTERSPEECH, pp. 808-812, 2013.

피인용 문헌

  1. 화성 진행 학습 모델을 적용한 규칙 기반의 4성부 합창 음악 생성 vol.41, pp.11, 2016, https://doi.org/10.7840/kics.2016.41.11.1456