References
- A. Agrawal, J. Lu, S. Antol, M. Mitchell, C. L. Zitnick, D. Parikh, and D. Batra, "VQA: visual question answering," International Journal of Computer Vision, vol. 123, no. 1, pp. 4-31, 2017. https://doi.org/10.1007/s11263-016-0966-6
- A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. F. Moura, D. Parikh, and D. Batra, "Visual dialog," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017, pp. 326-335.
- P. H. Seo, A. Lehrmann, B. Han, and L. Sigal, "Visual reference resolution using attention memory for visual dialog," Advances in Neural Information Processing Systems, vol. 30, pp. 3719-3729, 2017.
- J. Lu, A. Kannan, J. Yang, D. Parikh, and D. Batra, "Best of both worlds: transferring knowledge from discriminative learning to a generative visual dialog model," Advances in Neural Information Processing Systems, vol. 30, pp. 314-324, 2017.
- Q. Wu, P. Wang, C. Shen, I. Reid, and A. van den Hengel, "Are you talking to me? Reasoned visual dialog generation through adversarial learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 6106-6115.
- P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, "Bottom-up and top-down attention for image captioning and visual question answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 6077-6086.
- L. Peng, Y. Yang, Y. Bin, N. Xie, F. Shen, Y. Ji, and X. Xu, "Word-to-region attention network for visual question answering," Multimedia Tools and Applications, vol. 78, no. 3, pp. 3843-3858, 2019. https://doi.org/10.1007/s11042-018-6389-3
- A. Trott, C. Xiong, and R. Socher, "Interpretable counting for visual question answering," in Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018.
- M. T. Desta, L. Chen, and T. Kornuta, "Object-based reasoning in VQA," in Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, 2018, pp. 1814-1823.