The Study on Speaker Change Verification Using SNR based weighted KL distance

Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok;

doi:10.22156/CS4SMB.2017.7.6.159

Journal of Convergence for Information Technology (융합정보논문지)

Volume 7 Issue 6
/
Pages.159-166
/
2017
/
2586-4440(eISSN)

Convergence Society for SMB (중소기업융합학회)

DOI QR Code

The Study on Speaker Change Verification Using SNR based weighted KL distance

SNR 기반 가중 KL 거리를 활용한 화자 변화 검증에 관한 연구

Cho, Joon-Beom (Department of Nursing, Nambu University) ;
Lee, Ji-eun (Department of Living physical Training Special Study, Chunnam Techno University) ;
Lee, Kyong-Rok (Department of IT & Design, Nambu University)

조준범 (남부대학교 간호학과) ;
이지은 (전남과학대학교 생활체육과) ;
이경록 (남부대학교 IT.디자인학과)

Received : 2017.11.08
Accepted : 2017.12.20
Published : 2017.12.31

https://doi.org/10.22156/CS4SMB.2017.7.6.159 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we have experimented to improve the verification performance of speaker change detection on broadcast news. It is to enhance the input noisy speech and to apply the KL distance $D_s$ using the SNR-based weighting function $w_m$. The basic experimental system is the verification system of speaker change using GMM-UBM based KL distance D(Experiment 0). Experiment 1 applies the input noisy speech enhancement using MMSE Log-STSA. Experiment 2 applies the new KL distance $D_s$ to the system of Experiment 1. Experiments were conducted under the condition of 0% MDR in order to prevent missing information of speaker change. The FAR of Experiment 0 was 71.5%. The FAR of Experiment 1 was 67.3%, which was 4.2% higher than that of Experiment 0. The FAR of experiment 2 was 60.7%, which was 10.8% higher than that of experiment 0.

본 논문에서는 방송 뉴스에서 화자 변화 검증 성능 향상을 위해서 입력소음음성 향상과 SNR(Signal to Noise Ratio)기반 가중 함수 $w_m$를 적용한 KL 거리 $D_s$를 실험하였다. GMM-UBM(Gaussian Mixture Model-Universal Background Model) 기반 KL(Kullback Leibler) 거리 D를 이용한 화자 변화 검증 시스템(실험 0)을 기본 시스템으로 한다. 실험 1은 실험 0의 입력소음음성 향상을 위해 MMSE Log-STSA(Minimum Mean Square Error Log-Spectral Amplitude Estimator)를 적용하였다. 실험 2는 실험 1의 기존 KL거리 D 대신에 $D_s$를 적용하였다. 실험 데이터베이스는 다양한 소음을 반영하기 위해 스포츠 뉴스와 실외 인터뷰를 중심으로 구축하였다. 실험은 화자 변화 정보의 누락을 막기 위해 MDR(Missed Detection Rate) 0%를 기준으로 하였다. 실험 0은 FAR(False Alarm Rate) 71.5%의 성능을 보였다. 실험 1은 FAR 67.3%로 실험0에 비해 4.2% 향상되었고, 실험 2는 FAR 60.7%로 10.8% 향상되었다.

Keywords

References

J. S. Lim & A. V. Oppenheim. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586-1604. USA : IEEE. DOI : 10.21236/ada073139
Y. Ephraim & D. Malah. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109-1121. DOI : 10.1109/tassp.1984.1164453
Y. Ephraim & D. Malah. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443-445. DOI : 10.1109/icmcs.2014.6911142
K. Paliwal, B. Schwerin & K. Wojcicki. (2012). Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Communication, 54(2), 282-305. DOI : 10.1016/j.specom.2011.09.003
J. B. Cha. (2017). Minimum Mean Square Error, Glossary of ICT. Ktword. www.ktword.co.kr
P. C. Loizou. (2013). Speech enhancement : theory and practice. USA : CRC press.
V. O. Alan & C. Ve. George. (2010). CHAPTER 8 Estimation with Minimum Mean Square Error. MIT Open Course Ware. https://ocw.mit.edu
B. A. Soni & K. Vaghela. (2017). Spectral Subtraction and MMSE : A Hybrid Approach For Speech Enhancement. International Reaserch Journal of Engineering and Technology, 4(4), 2340-2343.
R. Gray, A. Buzo, A. Gray & Y. Matsuyama. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367-376. DOI : 10.1109/TASSP.1980.1163421
I. S. Gradshteyn & Z. M. Ryzhik. (1980). Table of integrals, series, and products. New York : Academic Press.
T. Y. Wu, L. Lu, K. Chen & H. Zhang. (2003). Universal Background Models for Real-time Speaker Change Detection. In MMM (pp. 135-149). Russia : MMM.
J. P. Campbell. (1997). Speaker recognition : A tutorial. Proceedings of the IEEE, 85(9), 1437-1462. USA : IEEE. DOI : 10.1109/5.628714
L. Lu & H. J. Zhang. (2002). Speaker change detection and tracking in real-time news broadcasting analysis. In Proceedings of the tenth ACM international conference on Multimedia (pp. 602-610). USA : ACM. DOI : 10.1145/641007.641127
J. B. Cho, J. E. Lee & K. R. Lee. (2016). The Study on the Verification of Speaker Change using GMM-UBM based KL distance. Journal of Convergence for Information Technology, 6(1), 71-77. DOI : 10.22156/cs4smb.2016.6.4.071
M. J. Alam1, P. Kenny1, P. Dumouchel & D. O'Shaughnessy. (2014). Noise Spectrum Estimation using Gaussian Mixture Model-based Speech Presence Probability for Robust Speech Recognition. INTERSPEECH 2014, 2759-2763. Singapore : INTERSPEECH.

Journal of Convergence for Information Technology (융합정보논문지)

The Study on Speaker Change Verification Using SNR based weighted KL distance

SNR 기반 가중 KL 거리를 활용한 화자 변화 검증에 관한 연구

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)