I-벡터 기반 오픈세트 언어 인식을 위한 다중 판별 DNN

Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition

  • Kang, Woo Hyun (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications) ;
  • Cho, Won Ik (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications) ;
  • Kang, Tae Gyoon (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications) ;
  • Kim, Nam Soo (Seoul National University Department of Electrical and Computer Engineering and Institute of New Media and Communications)
  • 투고 : 2016.04.25
  • 심사 : 2016.07.15
  • 발행 : 2016.08.31


본 논문에서는 여러 개의 이원 support vector machine (binary SVM)을 사용하여 세 개 이상의 클래스를 분류하는 multi-class SVM과 유사하게 다중의 판별 deep neural network (DNN) 모델을 사용하는 i-벡터 기반의 언어 인식 시스템을 제안한다. 제안하는 시스템은 NIST 2015 i-vector Machine Learning Challenge 데이터베이스에 포함된 i-벡터들을 이용하여 학습 및 테스트 되었으며, 오픈 세트에서 기존의 cosine distance, multi-class SVM 및 단일 neural network (NN) 기반의 언어 인식 시스템에 비하여 높은 성능을 보임이 확인되었다.

In this paper, we propose an i-vector based language recognition system to identify the spoken language of the speaker, which uses multiple discriminative deep neural network (DNN) models analogous to the multi-class support vector machine (SVM) classification system. The proposed model was trained and tested using the i-vectors included in the NIST 2015 i-vector Machine Learning Challenge database, and shown to outperform the conventional language recognition methods such as cosine distance, SVM and softmax NN classifier in open-set experiments.



  1. G. Hinton, L. Deng, D. Yu, A. Mohamed, et. al., "Deep neural networks for acoustic modeling in speech recognition," IEEE Sig. Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
  2. K. H. Lee, S. J. Kang, W. H. Kang, N. S. Kim, and S. J. Yang, "DNN-based feature compensation using environmental parameter," in Proc. KICS ICC 2015, pp. 72-73, Gangwon, Korea, Jan. 2016.
  3. Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, "A novel scheme for speaker recognition using a phonetically-aware deep neural network," in Proc. ICASSP 2014, pp. 1714-1718, Florence, Italy, May 2014.
  4. J. Wang, D. Wang, T. F. Zheng, and F. Bie, DNN-based discriminative scoring for speaker recognition based on i-vector, CSLT, Tech. Rep. 20150002, Jan. 2015.
  5. O. Ghahabi and J. Hernando, "Deep belief networks for i-vector based speaker recognition," in Proc. ICASSP 2014, pp. 1700-1704, Florence, Italy, May 2014.
  6. W. H. Kang, K. H. Lee, T. G. Kang, S. J. Kang, N. S. Kim, and K. J. Shin, "Speaker age regression using i-vectors trained with MFCC and pitch," in Proc. KICS ICC 2015, pp. 967-968, Jeju, Korea, Jun. 2015.
  7. W. H. Kang, K. H. Lee, T. G. Kang, and N. S. Kim, "NN based speaker age classification using i-vectors," in Proc. KICS ICC 2015, pp. 589-590, Seoul, Korea, Nov. 2015.
  8. I. Lopez-Moreno, J. Gonzalez-Dominguez, O. Plchot, D. Martinez, et. al., "Automatic language identification using deep neural networks," in Proc. ICASSP 2014, pp. 5374-5378, Florence, Italy, May 2014.
  9. C. Chang and C. Lin, "LIBSVM: a library for support vector machines," ACM TIST, vol. 2, no. 3, pp. 1-39, Apr. 2011.
  10. W. M. Campbell, E. Singer, P. Torres- Carrasquillo, and D. A. Reynolds, "Language recognition with support vector machines," in Proc. Odyssey 2004, pp. 41-44, Toledo, Spain, May-Jun. 2004.
  11. The 2015 Language Recognition i-Vector Machine Learning Challenge(2015), Retrieved Dec. 29, 2015, from oad/lre_ivectorchallenge_rel_v2.pdf
  12. N. Dehak, P. Kenny, R. Dehak, P. Dumouchei, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 4, pp. 788-798, May 2011.
  13. A. O. Hatch, S. S. Kajarekar, and A. Stolcke, "Within-class covariance normalization for SVM-based speaker recognition," in Proc. Interspeech, pp. 2-5, 2006.
  14. D. Reynolds, T. Quatieri, and R. Dunn, "Speaker verification using adapted gaussian mixture models," Digital Sign. Process., vol. 10, pp. 19-41, Jan. 2000.
  15. R. Salakhutdinov, "Learning deep generative models," Ph. D. Dissertation, University of Toronto, 2009.
  16. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, et. al., "Dropout: a simple way to prevent neural networks from overfitting," JMLR, vol. 15, no. 1, pp. 1929-1958, Jun. 2014.
  17. S. Furui, Speaker recognition(2008), Retrieved Jul., 12, 2016, from