References
- Bahdanau, D., Cho, K., & Bengio, Y. (2015, January). Neural machine translation by jointly learning to align and translate. Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015. San Diego, CA.
- Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016, March). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4960-4964). Shanghai, China.
- Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. Advances in Neural Information Processing Systems, 2015, 577-585.
- Chorowski, J., Bahdanau, D., Cho, K., & Bengio, Y. (2014, December). End-to-end continuous speech recognition using attention-based recurrent NN: First results. Proceedings of the NIPS 2014 Workshop on Deep Learning. Montreal, Canada.
- Graves, A., Fernandez, S., Gomez, F., & Schmidhuber, J. (2006, June). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning (pp. 369-376). Pittsburgh, PA.
- Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., ... Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82-97. https://doi.org/10.1109/MSP.2012.2205597
- Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, PMLR (pp. 448-456). Mountain View, CA.
- Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Z. Ghahramani., M. Welling, C. Cortes, N. Lawrence & K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 2177-2185). Red Hook, NY: Curran Associates.
- Luong, M. T., Pham, H., & Manning, C. D. (2015, September). Effective approaches to attention-based neural machine translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1412-1421). Lisbon, Portugal.
- Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., & Khudanpur, S. (2011, May). Extensions of recurrent neural network language model. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5528-5531). Prague, Czech Republic.
- Nair, V., & Hinton, G. E. (2010, January). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML) (pp. 807-814). Haifa, Israel.
- Paul, D. B., & Baker, J. (1992, February). The design for the wall street journal-based CSR corpus. Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman. New York, NY.
- Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., ... Silovsky, J. (2011, December). The Kaldi speech recognition toolkit. Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. Big Island, HI.
- Ranzato, M. A., Huang, F. J., Boureau, Y. L., & LeCun, Y. (2007, June). Unsupervised learning of invariant feature hierarchies with applications to object recognition. Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-8). Minneapolis, MN.
- Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long short-term memory, fully connected deep neural networks. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4580-4584). Brisbane, Australia.
- Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681. https://doi.org/10.1109/78.650093
- Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv. Retrieved from https://arxiv.org/abs/1409.1556
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Z. Ghahramani., M. Welling, C. Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 3104-3112). Red Hook, NY: Curran Associates.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In U. von Luxburg, A. Bengio, R. Fergus, R. Garnett, I. Guyon, H. Wallach, & S. V. N. Vishwanathan (Eds.), Advances in neural information processing systems (pp. 5998-6008). Red Hook, NY: Curran Associates.
- Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Soplin, N. E. Y., ... Renduchintala, A. (2018). Espnet: End-to-end speech processing toolkit. arXiv. Retrieved from https://arxiv.org/abs/1804.00015