References
- Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature vol. 521, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," In Advances in neural information processing systems (NIPS), 2012.
- R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 580-587, 2014.
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, "Learning deep features for scene recognition using places database," In Advances in neural information processing systems (NIPS), pp. 487-495, 2014.
- F. Schroff, D. Kalenichenko, and J. Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815-823, 2015.
- Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1701-1708).
- A. Toshev, and C. Szegedy. "Deeppose: Human pose estimation via deep neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653-1660, 2014.
- J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, "Joint training of a convolutional network and a graphical model for human pose estimation," In Advances in neural information processing systems (NIPS), pp. 1799-1807, 2014.
- S.-W. Lee, C.-Y. Lee, D. Kwak, J. Kim, J. Kim, and B.-T. Zhang, "Dual-memory deep learning architectures for lifelong learning of everyday human behaviors," International Joint Conference on Artificial Intelligence (IJCAI 2016), pp. 1669-1675, 2016.
- C. Park, and G. Kim, "Expressing an Image Stream with a Sequence of Natural Sentences," In Advances in neural information processing systems (NIPS), 2015.
- Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler, "Aligning books and movies: Towards story-like visual explanations by watching movies and reading books," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 19-27, 2015.
- K.-M. Kim, C.-J. Nan, M.-O. Heo, S.-H. Choi, B.-T. Zhang, "DeepStory: video story qa by deep embedded memory networks," AAAI 2017 (submitted)
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol.86(11), 2278-2324, 1998. https://doi.org/10.1109/5.726791
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," J. Machine Learning Res. vol.15, pp. 1929-1958, 2014.
- C. Szegedy, W. Liu, W., Y. Jia, P. Sermanet, S. Reed, D. Anguelov, and A. Rabinovich, "Going deeper with convolutions," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1-9, 2015.
- K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Y. Bengio, A. Courville, and P. Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence, vol.35.8 , pp. 1798-1828, 2013. https://doi.org/10.1109/TPAMI.2013.50
- W. W. Zhu, A. Berndsen, E. C. Madsen, M. Tan, I. H. Stairs, A. Brazier, P. Lazarus, R. Lynch, P. Scholz, K. Stovall, et al. "Searching for pulsars using image pattern recognition," The Astrophysical Journal, vol.781(2):117, 2014. https://doi.org/10.1088/0004-637X/781/2/117
- G. Hinton, and R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786, pp. 504-507, 2006. https://doi.org/10.1126/science.1127647
- R. Salakhutdinov, and G. Hinton, "Deep Boltzmann machines," In Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 448-455, 2009.
- D. P. Kingma, M. Welling, "Auto-Encoding Variational Bayes," International Conference on Learning Representations (ICLR), 2014.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio, "Generative adversarial nets," In Advances in Neural Information Processing Systems (NIPS), pp. 2672-2680, 2014.
- T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," In Proc. Advances in Neural Information Processing Systems (NIPS), pp. 3111-3119, 2013.
- S. Hochreiter, and J. Schmidhuber, "Long short-term memory," Neural Comput. vol. 9, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
- K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
- J. Chung, C. Gulcehre, K.H. Cho, and Y. Bengio, Empirical Evalutation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv:1412.3555, 2014.
- W. Zhang and M. Lapata, "Chinese Poetry Generation with Recurrent Neural Networks," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
- A. Karpathy, "The Unreasonalbe Effectiveness of Recurrent Neural Networks," http://karpathy.github.io/2015/05/21/rnn-ef fectiveness/
- http://benjamin.wtf
- K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, D. Wierstra, "DRAW: a recurrent neural network for image generation," International Conference on Machine Learning (ICML), 2015.
- A. van den Oord, N. Kalchbrenner, K. Kavukcuoglu, "Pixel recurrent neural networks," International Conference on Machine Learning (ICML), 2016.
- J. Weston, S. Chopra, and A. Bordes, "Memory networks," International Conference on Learning Representation (ICLR), 2015.
- S. Sukhbaatar, J. Weston, and R. Fergus. "End-to-end memory networks." Advances in neural information processing systems (NIPS), 2015.
- A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, and R. Socher, "Ask Me Anything: Dynamic Memory Networks for Natural Language Processing," International Conference on Machine Learning (ICML), 2016.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imageneet: A large-scale hierarchical image database," In Computer Vision and Pattern Recognition (CVPR), pp. 248-255, 2009.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft coco: Common objects in context," In Computer Vision-ECCV 2014, pp. 740-755, 2014.
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick and D. Parikh, "VQA: Visual Question Answering," In International Conference on Computer Vision (ICCV), 2015.
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. Shamma, M. Bernstein, and L. Fei-Fei, "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations," https://arxiv.org/abs/1602.07332, 2016.
- J. Markoff, "A Learning Advance in Artificial Intelligence Rivals Human Abilities". The New York Times, 2015-12-10.
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156-3164, 2015.
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ," International Conference on Machine Learning (ICML), 2015.
- A. Karpathy, and L. Fei-Fei, "Deep Visual-Semantic Alignments for Generating Image Descriptions," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig, "From Captions to Visual Concepts and Back," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- X. Chen, and C. L. Zitnick, "Mind's Eye: A Recurrent Visual Representation for Image Caption Generation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
- R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, "zero-shot learning through cross-modal transfer," In Advances in neural information processing systems (NIPS), pp. 935-943, 2013.
- R. Kiros, R. Salakhutdinov, and R. Zemel,."Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models," Transactions of the Association for Computational Linguistics, (To appear).
- L. Ba, K. Swersky, and S. Fidler. "Predicting deep zero-shot convolutional neural networks using textual descriptions," Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
- E. Mansimov, E. Parisotto, J. Ba, and R. Salakhutdinov, "Generating Images from Captions with Attention," International Conference on Learning Representation (ICLR), 2016.
- S. Reed, Z. Akata, X. Yan, L. Logeswaran, Bernt Schiele, and H. Lee, "Generative Adversarial Text to Image Synthesis," International Conference on Machine Learning (ICML), 2016.
- A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, M. Rohrbach, "Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding," EMNLP 2016 (accepted).
- J.-H. Kim, S.-W. Lee, D.-H. Kwak, M.-O. Heo, J. Kim, J.-W. Ha, B.-T. Zhang, "Multimodal Residual Learning for Visual QA, " Advances in neural information processing systems (NIPS) 2016 (accepted).
- Q. Wu, D. Teney, P. Wang, C. Shen, A. Dick, and A. van den Hengel, "Visual Question Answering: A Survey of Methods and Datasets," arXiv:1607.05910, 2016.
- K. Kafle, and C. Kanan, "Visual Question Answering: Datasets, Algorithms, and Future Challenges", arXiv:1610.01465, 2016.
- A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele, "A dataset for movie description," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- M. Tapaswi, Y Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler, "Movieqa: Understanding stories in movies through question- answering," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- C. Fan, and D. J. Crandall, "DeepDiary: Automatic Caption Generation for Lifelogging Image Streams," arXiv:1608.03819v1, 2016.
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko, " Translating videos to natural language using deep recurrent neural networks," the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT), 2015.
- A. Rohrbach, M. Rohrbach, and B. Schiele, "The long-short story of movie description," German Conference on Pattern Recognition, Springer International Publishing, 2015.
- R. Kiros, Y. Zhu, R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler, "Skip-thought vectors," In Advances in neural information processing systems (NIPS), pp. 3294-3302, 2015.
- L.J.P. van der Maaten and G.E. Hinton. "Visualizing High-Dimensional Data Using t-SNE," Journal of Machine Learning Research 9(Nov):2579-2605, 2008.
- L. Zhu, Z. Xu, Y. Yang, and A. Hauptmann, "Uncovering Temporal Context for Video Question and Answering," arXiv preprint arXiv:1511.04670, 2015.