Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition

Kang, Woo Hyun;Cho, Won Ik;Kang, Tae Gyoon;Kim, Nam Soo;

doi:10.7840/kics.2016.41.8.958

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Volume 41 Issue 8
/
Pages.958-964
/
2016
/
1226-4717(pISSN)
/
2287-3880(eISSN)

The Korean Institute of Commucations and Information Sciences (한국통신학회)

DOI QR Code

Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition

I-벡터 기반 오픈세트 언어 인식을 위한 다중 판별 DNN

Kang, Woo Hyun (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications) ;
Cho, Won Ik (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications) ;
Kang, Tae Gyoon (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications) ;
Kim, Nam Soo (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications)

Received : 2016.04.25
Accepted : 2016.07.15
Published : 2016.08.31

https://doi.org/10.7840/kics.2016.41.8.958 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose an i-vector based language recognition system to identify the spoken language of the speaker, which uses multiple discriminative deep neural network (DNN) models analogous to the multi-class support vector machine (SVM) classification system. The proposed model was trained and tested using the i-vectors included in the NIST 2015 i-vector Machine Learning Challenge database, and shown to outperform the conventional language recognition methods such as cosine distance, SVM and softmax NN classifier in open-set experiments.

본 논문에서는 여러 개의 이원 support vector machine (binary SVM)을 사용하여 세 개 이상의 클래스를 분류하는 multi-class SVM과 유사하게 다중의 판별 deep neural network (DNN) 모델을 사용하는 i-벡터 기반의 언어 인식 시스템을 제안한다. 제안하는 시스템은 NIST 2015 i-vector Machine Learning Challenge 데이터베이스에 포함된 i-벡터들을 이용하여 학습 및 테스트 되었으며, 오픈 세트에서 기존의 cosine distance, multi-class SVM 및 단일 neural network (NN) 기반의 언어 인식 시스템에 비하여 높은 성능을 보임이 확인되었다.

Keywords

References

G. Hinton, L. Deng, D. Yu, A. Mohamed, et. al., "Deep neural networks for acoustic modeling in speech recognition," IEEE Sig. Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012. https://doi.org/10.1109/MSP.2012.2205597
K. H. Lee, S. J. Kang, W. H. Kang, N. S. Kim, and S. J. Yang, "DNN-based feature compensation using environmental parameter," in Proc. KICS ICC 2015, pp. 72-73, Gangwon, Korea, Jan. 2016.
Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, "A novel scheme for speaker recognition using a phonetically-aware deep neural network," in Proc. ICASSP 2014, pp. 1714-1718, Florence, Italy, May 2014.
J. Wang, D. Wang, T. F. Zheng, and F. Bie, DNN-based discriminative scoring for speaker recognition based on i-vector, CSLT, Tech. Rep. 20150002, Jan. 2015.
O. Ghahabi and J. Hernando, "Deep belief networks for i-vector based speaker recognition," in Proc. ICASSP 2014, pp. 1700-1704, Florence, Italy, May 2014.
W. H. Kang, K. H. Lee, T. G. Kang, S. J. Kang, N. S. Kim, and K. J. Shin, "Speaker age regression using i-vectors trained with MFCC and pitch," in Proc. KICS ICC 2015, pp. 967-968, Jeju, Korea, Jun. 2015.
W. H. Kang, K. H. Lee, T. G. Kang, and N. S. Kim, "NN based speaker age classification using i-vectors," in Proc. KICS ICC 2015, pp. 589-590, Seoul, Korea, Nov. 2015.
I. Lopez-Moreno, J. Gonzalez-Dominguez, O. Plchot, D. Martinez, et. al., "Automatic language identification using deep neural networks," in Proc. ICASSP 2014, pp. 5374-5378, Florence, Italy, May 2014.
C. Chang and C. Lin, "LIBSVM: a library for support vector machines," ACM TIST, vol. 2, no. 3, pp. 1-39, Apr. 2011.
W. M. Campbell, E. Singer, P. Torres- Carrasquillo, and D. A. Reynolds, "Language recognition with support vector machines," in Proc. Odyssey 2004, pp. 41-44, Toledo, Spain, May-Jun. 2004.
The 2015 Language Recognition i-Vector Machine Learning Challenge(2015), Retrieved Dec. 29, 2015, from http://www.nist.gov/itl/iad/mig/upl oad/lre_ivectorchallenge_rel_v2.pdf
N. Dehak, P. Kenny, R. Dehak, P. Dumouchei, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 4, pp. 788-798, May 2011. https://doi.org/10.1109/TASL.2010.2064307
A. O. Hatch, S. S. Kajarekar, and A. Stolcke, "Within-class covariance normalization for SVM-based speaker recognition," in Proc. Interspeech, pp. 2-5, 2006.
D. Reynolds, T. Quatieri, and R. Dunn, "Speaker verification using adapted gaussian mixture models," Digital Sign. Process., vol. 10, pp. 19-41, Jan. 2000. https://doi.org/10.1006/dspr.1999.0361
R. Salakhutdinov, "Learning deep generative models," Ph. D. Dissertation, University of Toronto, 2009.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, et. al., "Dropout: a simple way to prevent neural networks from overfitting," JMLR, vol. 15, no. 1, pp. 1929-1958, Jun. 2014.
S. Furui, Speaker recognition(2008), Retrieved Jul., 12, 2016, from http://www.scholarpedia.org/article/Speaker_recognition

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition

I-벡터 기반 오픈세트 언어 인식을 위한 다중 판별 DNN

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)