References
- Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (pp. 1097-1105).
- Sainath, T., Kingsbury, B., Saon, G., Soltau, H., Mohamed, A., Dahl, G., & Ramabhadran, B. (2015). Deep convolutional neural networks for large-scale speech tasks. Neural Networks, 64, 39-48.
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).
- Srivastava, R., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. Proceedings of the Advances in Neural Information Processing Systems 28 (pp. 2377-2385).
- Huang, G., Liu, Z., Maaten, L., & Weinberger, K. (2017). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700-4708).
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252. https://doi.org/10.1007/s11263-015-0816-y
- Park, S., Jeong, Y., & Kim, H. (2017). Multiresolution CNN for reverberant speech recognition. Proceedings of the Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques.
- Robinson, T., Fransen, J., Pye, D., Foote, J., & Renals, S. (1995). WSJCAMO: A British English speech corpus for large vocabulary continuous speech recognition. 1995 International Conference on Acoustics, Speech, and Signal Processing (pp. 81-84). Detroit, MI. 1995.
- Lincoln, M., McCowan, I., Vepa, J., & Maganti, H. (2005). The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. San Juan (pp. 357-362).
- Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011) (p. 4). Hawaii. 11-15 December, 2011.
- Yu, D., Yao, K., & Zhang, Y. (2015). The computational network toolkit. IEEE Signal Processing Magazine, 32(6), 123-126. https://doi.org/10.1109/MSP.2015.2462371
- Qian, Y., Bi, M., Tan, T., & Yu, K. (2016). Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2263-2276. https://doi.org/10.1109/TASLP.2016.2602884