Performance Comparison of Deep Feature Based Speaker Verification Systems

Kim, Dae Hyun;Seong, Woo Kyeong;Kim, Hong Kook;

doi:10.13064/KSSS.2015.7.4.009

말소리와 음성과학 (Phonetics and Speech Sciences)

제7권4호
/
Pages.9-16
/
2015
/
2005-8063(pISSN)
/
2586-5854(eISSN)

한국음성학회 (Korean Society of Speech Sciences)

DOI QR Code

깊은 신경망 특징 기반 화자 검증 시스템의 성능 비교

Performance Comparison of Deep Feature Based Speaker Verification Systems

김대현 (광주과학기술원) ;
성우경 (광주과학기술원) ;
김홍국 (광주과학기술원)

투고 : 2015.08.29
심사 : 2015.12.09
발행 : 2015.12.31

https://doi.org/10.13064/KSSS.2015.7.4.009 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, several experiments are performed according to deep neural network (DNN) based features for the performance comparison of speaker verification (SV) systems. To this end, input features for a DNN, such as mel-frequency cepstral coefficient (MFCC), linear-frequency cepstral coefficient (LFCC), and perceptual linear prediction (PLP), are first compared in a view of the SV performance. After that, the effect of a DNN training method and a structure of hidden layers of DNNs on the SV performance is investigated depending on the type of features. The performance of an SV system is then evaluated on the basis of I-vector or probabilistic linear discriminant analysis (PLDA) scoring method. It is shown from SV experiments that a tandem feature of DNN bottleneck feature and MFCC feature gives the best performance when DNNs are configured using a rectangular type of hidden layers and trained with a supervised training method.

키워드

참고문헌

Kinnunen, T. & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, Vol. 52, No. 1, 12-40. https://doi.org/10.1016/j.specom.2009.08.009
Reynolds, D. A., Quatieri, T. F. & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, Vol. 10, No. 1, 19-41. https://doi.org/10.1006/dspr.1999.0361
Kenny, P., Boulianne, G., Ouellet, P. & Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, No. 4, 1435-1447. https://doi.org/10.1109/TASL.2006.881693
Matrouf, D., Scheffer, N., Fauve, B. G. & Bonastre, J. F. (2007). A straightforward and efficient implementation of the factor analysis model for speaker verification. In Proceedings of Interspeech, Antwerp, Belgium, 1242-1245.
Dehak, N., Dehak, R., Glass, J. R., Reynolds, D. A. & Kenny, P. (2010). Cosine similarity scoring without score normalization techniques. In Proceedings of Odyssey Speaker and Language Recognition Workshop, Brno, Czech Republic, 71-75.
Fu, T., Qian, Y., Liu, Y. & Yu, K. (2014). Tandem deep features for text-dependent speaker verification. In Proceedings of Interspeech, Singapore, Singapore, 1327-1331.
Yu, D. & Seltzer, M. L. (2011). Improved bottleneck features using pretrained deep neural networks. In Proceedings of Interspeech, Florence, Italy, 237-240.
Zhang, Y., Chuangsuwanich, E., & Glass, J. (2014). Extracting deep neural network bottleneck features using low-rank matrix factorization. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 185-189.
Liu, Y., Fu, T., Fan, Y., Qian, Y., & Yu, K. (2014). Speaker verification with deep features. In Proceedings of International Joint Conference on Neural Networks (IJCNN), Beijing, China, 747-753.
Kanagasundaram, A. (2014). Speaker verification using I-vector features. Ph.D. Dissertation, Queensland University of Technology.
Kenny, P., Boulianne, G. & Dumouchel, P. (2005). Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, Vol. 13, No. 3, 345-354. https://doi.org/10.1109/TSA.2004.840940
Bishop, C. M. (2007). Pattern Recognition and Machine Learning (Information Science and Statistics), Springer.
Prince, S. J. & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In Proceedings of IEEE International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 1-8.
Lee, K. A., Larcher, A., You, C. H., Ma, B. & Li, H. (2013). Multi-session PLDA scoring of i-vector for partially open-set speaker detection. In Proceedings of Interspeech, Lyon, France, 3651-3655.
Kenny, P. (2010). Bayesian speaker verification with heavy tailed priors. In Proceedings of Odyssey Speaker and Language Recognition Workshop, Brno, Czech Republic, paper no 014.
Sainath, T. N., Kingsbury, B. & Ramabhadran, B. (2012). Auto-encoder bottleneck features using deep belief networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 4153-4156.
Larcher, A., Bonastre, J. F., Fauve, B. G., Lee, K. A., Levy, C., Li, H. & Parfait, J. Y. (2013). ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition. In Proceedings of Interspeech, Lyon, France, 2768-2772.
Bonastre, J. F., Wils, F. & Meignier, S. (2005). ALIZE, a free toolkit for speaker recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA, 737-740.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N. & Vesel, K. (2011). The Kaldi speech recognition toolkit. In Proceedings of IEEE ASRU, Honolulu, HI, 1-4.
Brummer, N. & De Villiers, E. (2010). The speaker partitioning problem. In Proceedings of Odyssey Speaker and Language Recognition Workshop, Brno, Czech Republic, 194-201.
Greenberg, C. S., Stanford, V. M., Martin, A. F., Yadagiri, M., Doddington, G. R., Godfrey, J. J. & Hernandez-Cordero, J. (2013). The 2012 NIST speaker recognition evaluation. In Proceedings of Interspeech, Lyon, France, 1971-1975.

말소리와 음성과학 (Phonetics and Speech Sciences)

깊은 신경망 특징 기반 화자 검증 시스템의 성능 비교

Performance Comparison of Deep Feature Based Speaker Verification Systems

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)