DOI QR코드

DOI QR Code

The Study on Speaker Change Verification Using SNR based weighted KL distance

SNR 기반 가중 KL 거리를 활용한 화자 변화 검증에 관한 연구

  • Cho, Joon-Beom (Department of Nursing, Nambu University) ;
  • Lee, Ji-eun (Department of Living physical Training Special Study, Chunnam Techno University) ;
  • Lee, Kyong-Rok (Department of IT & Design, Nambu University)
  • 조준범 (남부대학교 간호학과) ;
  • 이지은 (전남과학대학교 생활체육과) ;
  • 이경록 (남부대학교 IT.디자인학과)
  • Received : 2017.11.08
  • Accepted : 2017.12.20
  • Published : 2017.12.31

Abstract

In this paper, we have experimented to improve the verification performance of speaker change detection on broadcast news. It is to enhance the input noisy speech and to apply the KL distance $D_s$ using the SNR-based weighting function $w_m$. The basic experimental system is the verification system of speaker change using GMM-UBM based KL distance D(Experiment 0). Experiment 1 applies the input noisy speech enhancement using MMSE Log-STSA. Experiment 2 applies the new KL distance $D_s$ to the system of Experiment 1. Experiments were conducted under the condition of 0% MDR in order to prevent missing information of speaker change. The FAR of Experiment 0 was 71.5%. The FAR of Experiment 1 was 67.3%, which was 4.2% higher than that of Experiment 0. The FAR of experiment 2 was 60.7%, which was 10.8% higher than that of experiment 0.

본 논문에서는 방송 뉴스에서 화자 변화 검증 성능 향상을 위해서 입력소음음성 향상과 SNR(Signal to Noise Ratio)기반 가중 함수 $w_m$를 적용한 KL 거리 $D_s$를 실험하였다. GMM-UBM(Gaussian Mixture Model-Universal Background Model) 기반 KL(Kullback Leibler) 거리 D를 이용한 화자 변화 검증 시스템(실험 0)을 기본 시스템으로 한다. 실험 1은 실험 0의 입력소음음성 향상을 위해 MMSE Log-STSA(Minimum Mean Square Error Log-Spectral Amplitude Estimator)를 적용하였다. 실험 2는 실험 1의 기존 KL거리 D 대신에 $D_s$를 적용하였다. 실험 데이터베이스는 다양한 소음을 반영하기 위해 스포츠 뉴스와 실외 인터뷰를 중심으로 구축하였다. 실험은 화자 변화 정보의 누락을 막기 위해 MDR(Missed Detection Rate) 0%를 기준으로 하였다. 실험 0은 FAR(False Alarm Rate) 71.5%의 성능을 보였다. 실험 1은 FAR 67.3%로 실험0에 비해 4.2% 향상되었고, 실험 2는 FAR 60.7%로 10.8% 향상되었다.

Keywords

References

  1. J. S. Lim & A. V. Oppenheim. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586-1604. USA : IEEE. DOI : 10.21236/ada073139
  2. Y. Ephraim & D. Malah. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109-1121. DOI : 10.1109/tassp.1984.1164453
  3. Y. Ephraim & D. Malah. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443-445. DOI : 10.1109/icmcs.2014.6911142
  4. K. Paliwal, B. Schwerin & K. Wojcicki. (2012). Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Communication, 54(2), 282-305. DOI : 10.1016/j.specom.2011.09.003
  5. J. B. Cha. (2017). Minimum Mean Square Error, Glossary of ICT. Ktword. www.ktword.co.kr
  6. P. C. Loizou. (2013). Speech enhancement : theory and practice. USA : CRC press.
  7. V. O. Alan & C. Ve. George. (2010). CHAPTER 8 Estimation with Minimum Mean Square Error. MIT Open Course Ware. https://ocw.mit.edu
  8. B. A. Soni & K. Vaghela. (2017). Spectral Subtraction and MMSE : A Hybrid Approach For Speech Enhancement. International Reaserch Journal of Engineering and Technology, 4(4), 2340-2343.
  9. R. Gray, A. Buzo, A. Gray & Y. Matsuyama. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367-376. DOI : 10.1109/TASSP.1980.1163421
  10. I. S. Gradshteyn & Z. M. Ryzhik. (1980). Table of integrals, series, and products. New York : Academic Press.
  11. T. Y. Wu, L. Lu, K. Chen & H. Zhang. (2003). Universal Background Models for Real-time Speaker Change Detection. In MMM (pp. 135-149). Russia : MMM.
  12. J. P. Campbell. (1997). Speaker recognition : A tutorial. Proceedings of the IEEE, 85(9), 1437-1462. USA : IEEE. DOI : 10.1109/5.628714
  13. L. Lu & H. J. Zhang. (2002). Speaker change detection and tracking in real-time news broadcasting analysis. In Proceedings of the tenth ACM international conference on Multimedia (pp. 602-610). USA : ACM. DOI : 10.1145/641007.641127
  14. J. B. Cho, J. E. Lee & K. R. Lee. (2016). The Study on the Verification of Speaker Change using GMM-UBM based KL distance. Journal of Convergence for Information Technology, 6(1), 71-77. DOI : 10.22156/cs4smb.2016.6.4.071
  15. M. J. Alam1, P. Kenny1, P. Dumouchel & D. O'Shaughnessy. (2014). Noise Spectrum Estimation using Gaussian Mixture Model-based Speech Presence Probability for Robust Speech Recognition. INTERSPEECH 2014, 2759-2763. Singapore : INTERSPEECH.