딥러닝 모델 adaptation 기술의 연구 동향

  • Published : 2016.08.31

Abstract

딥러닝 기술은 수많은 입력 데이터에 내재하고 있는 특징을 추출 및 합성함으로써 복잡한 특징공간을 모델링할 수 있는 강점을 가지지만, 테스트 환경에서 나타날 수 있는 특정 데이터 분포에 대하여 일반화가 잘 되지 않을 경우에는 해당 데이터를 이용하여 주어진 환경에 모델을 적응시킬 수 있는 기술을 필요로 한다. 이 글에서는 DNN 모델의 adaptation 기술 연구가 가장 활발하게 진행되고 있는 음향모델링에서의 다양한 adaptation 기술을 통해 연구 동향을 알아본다.

Keywords

References

  1. Dahl, George E., et al. "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition." IEEE Transactions on Audio, Speech, and Language Processing 20.1 (2012): 30-42. https://doi.org/10.1109/TASL.2011.2134090
  2. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
  3. Mikolov, T. and J. Dean. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems (2013).
  4. Graves, Alex. "Generating sequences with recurrent neural networks." arXiv preprint arXiv:1308.0850 (2013).
  5. Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257-286. https://doi.org/10.1109/5.18626
  6. Serizel, Romain, and Diego Giuliani. "Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition." Spoken Language Technology Workshop (SLT), 2014 IEEE. IEEE, 2014.
  7. Leggetter, Christopher J., and Philip C. Woodland. "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models." Computer Speech & Language, 9.2 (1995): 171-185. https://doi.org/10.1006/csla.1995.0010
  8. Parthasarathi, Sree Hari Krishnan, et al. "fMLLR based feature-space speaker adaptation of DNN acoustic models." Sixteenth Annual Conference of the International Speech Communication Association. 2015.
  9. Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798. https://doi.org/10.1109/TASL.2010.2064307
  10. Miao, Yajie, Hao Zhang, and Florian Metze. "Speaker adaptive training of deep neural network acoustic models using i-vectors." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23.11 (2015): 1938-1949. https://doi.org/10.1109/TASLP.2015.2457612
  11. Yao, Kaisheng, et al. "Adaptation of contextdependent deep neural networks for automatic speech recognition." Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012.
  12. Swietojanski, Pawel, and Steve Renals. "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models." Spoken Language Technology Workshop (SLT), 2014 IEEE. IEEE, 2014.
  13. Price, Ryan, Ken-ichi Iso, and Koichi Shinoda. "Speaker adaptation of deep neural networks using a hierarchy of output layers." Spoken Language Technology Workshop (SLT), 2014 IEEE. IEEE, 2014.
  14. Yu, Dong, et al. "KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
  15. Albesano, Dario, et al. "Adaptation of artificial neural networks avoiding catastrophic forgetting." The 2006 IEEE International Joint Conference on Neural Network Proceedings. IEEE, 2006.
  16. Bell, Peter, and Steve Renals. "Regularization of context-dependent deep neural networks with context-independent multi-task training." 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015.
  17. Huang, Zhen, et al. "Rapid adaptation for deep neural networks through multi-task learning." Proc. Interspeech. 2015.
  18. Xue, Jian, et al. "Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.
  19. Zhang, C., and P. C. Woodland. "DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016.
  20. Miao, Yajie, and Florian Metze. "On speaker adaptation of long short-term memory recurrent neural networks." Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH)(To Appear). ISCA. 2015.
  21. Graves, Alex, and Navdeep Jaitly. "Towards End-To-End Speech Recognition with Recurrent Neural Networks." ICML. Vol. 14. 2014.