참고문헌
- Abdel-Hamid, O., Mohamed, A. R., Jiang, H., & Penn, G. (2012, March). Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4277-4280).
- Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016, March). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4960-4964).
- Chorowski, J., Bahdanau, D., Cho, K., & Bengio, Y. (2014). End-to-end continuous speech recognition using attention-based recurrent NN: First results [Computing Research Repository]. Retrieved from http://arxiv.org/abs/1412.1602
- Dayhoff, J. E., & DeLeo, J. M. (2001). Artificial neural networks: Opening the black box. Cancer: Interdisciplinary International Journal of the American Cancer Society, 91(S8), 1615-1635.
- Graves, A., Fernandez, S., Gomez, F., & Schmidhuber, J. (2006, June). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning (pp. 369-376).
- Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6645-6649).
- Gunning, D. (2017). Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA).
- Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A. R., Jaitly, N., Senior, A., ... Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82-97. https://doi.org/10.1109/MSP.2012.2205597
- LeCun, Y., Cortes, C., & Burges, C. J. C. (1998). The MNIST database of handwritten digits. Retrieved from http://yann.lecun.com/exdb/mnist
- Miao, Y., Gowayyed, M., & Metze, F. (2015, October). EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 167-174).
- Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., & Khudanpur, S. (2010). Recurrent neural network based language model. Proceedings of the 11th Annual Conference of the International Speech Communication Association (pp. 1045-1048).
- Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015, April). Librispeech: An ASR corpus based on public domain audio books. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5206-5210).
- Sriram, A., Jun, H., Satheesh, S., & Coates, A. (2017). Cold fusion: Training seq2seq models together with language models [Computing Research Repository]. Retrieved from http://arxiv.org/abs/1708.06426
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Proceedings of the Advances in Neural Information Processing Systems (pp. 3104-3112).
- Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Enrique Yalta Soplin, N., ... Ochiai, T. (2018). ESPnet: End-to-End Speech Processing Toolkit [Computing Research Repository]. Retrieved from http://arxiv.org/abs/1804.00015