References
- S. Antol, A. Agrawal, and J. Lu, et al., "VQA: Visual Question Answering," in Proceedings of the International Conference on Computer Vision (ICCV), pp.2425-2433, 2015.
- R. Zellers, Y. Bisk, and A. Farhadi, et al., "From Recognition to Cognition: Visual Commonsense Reasoning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6720-6731, 2019.
- P. Wang, Q. Wu, and C. Shen, et al., "FVQA: Fact-based Visual Question Answering," in Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol.40, pp.2413-2427, 2017.
- S. Shah, A. Mishra, and N. Yadati, et al., "KVQA: Knowledge-aware Visual Question Answering," in Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), 2019.
- M. Narasimhan, S. Lazebnik, and A. G.Schwing, "Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering," in Proceedings of the Conference on Neural Information Processing Systems (NIPS), pp.2654-2665, 2018.
- P. Anderson, X. He, and C. Buehler, et al., "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6077-6086, 2018.
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola, "Stacked Attention Networks for Image Question Answering," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.21-29, 2016.
- J. Lu, J. Yang, and D. Batra, et al., "Hierarchical Question-Image Co-Attention for Visual Question Answering," in Proceedings of the Conference on Neural Information Processing Systems (NIPS), pp.289-297, 2016.
- M. Lao, Y. Guo, H. Wang, and X. Zhang, "Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering," in Proceedings of IEEE Access, Vol.6, pp.31516-41524, June. 2018. https://doi.org/10.1109/ACCESS.2018.2844789
- C. Yang, M. Jiang, B. Jiang, W. Zhou, and K. Li, "Co-Attention Network with Question Type for Visual Question Answering," in Proceedings of IEEE Access, Vol.7, pp.40771-40781, Mar. 2019. https://doi.org/10.1109/ACCESS.2019.2908035
- A. Soren, C. Bizer, and G. Kovilarov, et al., "DBpedia: A Nucleus for a Web of Open Data," in Proceedings of The semantic web. Springer, Berlin, Heidelberg, 2007.
- K. Bollacker, C. Evans, and P. Paritosh, et al., "Freebase: A Collaboratively Created Graph Database for Structing Human Knowledge," in Proceedings of ACM SIGMOD International Conference on Management of Data, pp.1247-1250, 2008.
- L. Hugo, and S. Singh, "ConceptNet-A Practical Commonsense Reasoning Tool-kit," British Telecommunications (BT) Technology Journal, Vol.22, pp.211-226, 2004.
- P. Wang, Q. Wu, and C. Shen, et al., "Explicit Knowledge-based Reasoning for Visual Question Answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- K. Marino, M. Rastegari, and A. Farhadi, et al., "OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3195-3204, 2019.
- J. Zhou, G. Cui, and Z. Zhang, et al., "Graph Neural Network: A Review of Methods and Applications," arXiv preprint arXiv preprint arXiv:1812.08434, 2018.
- Y. LeCun, B. Boser, and J. Denker, et al., "Backpropagation Applied to Handwritten Zip Code Recognition," Neural Computation, Vol.1, Issue 4, pp.541-551, 1989. https://doi.org/10.1162/neco.1989.1.4.541
- S. Hochreiter, and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol.9, Issue 8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
- T. N, and M. Welling, "Semi-Superviced Classification with Graph Convolutional Networks," in Proceedings of the International Conference on Learning Representations (ICLR), 2017.
- J. Yang, J, Lu and S. Lee, et al., "Graph R-CNN for Scene Graph Generation," in Proceedings of the European Conference on Computer Vision (ECCV), pp.670-685, 2018.
- Y. Cao, M, Fang and D. Tao, et al., "BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering," arXiv preprint arXiv:1904.04969, 2019.
- J. Devlin, M, Chang and K. Lee, et al., "BBert: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv preprint arXiv:1810.04805, 2018.