과제정보
이 논문은 과학기술정보통신부의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임[No. 2020-0-00004, 장기 시각 메모리 네트워크 기반의 예지형 시각지능 핵심기술 개발].
참고문헌
- J. Johnson et al., "Image retrieval using scene graphs," in Proc. IEEE/CVF CVPR, (Boston, MA, USA), June 2015, pp. 3668-3678.
- C. Lu et al., "Visual relationship detection with language priors," in Proc. ECCV, Oct. 2016, pp. 852-569.
- R. Krishna et al., "Visual genome: Connecting language and vision using crowdsourced dense image annotations," Int. J. Comput. Vis., vol. 123, no. 1, May 2017, pp. 32-73. https://doi.org/10.1007/s11263-016-0981-7
- J. Ji et al., "Action genome: actions as compositions of spatio-temporal scene graphs," in Proc. IEEE/CVF CVPR, June 2020, pp. 10233-10244.
- Y. Zhong et al., "Comprehensive image captioning via scene graph decomposition," in Proc. ECCV, Aug. 2020, pp. 211-229.
- X. Yang et al., "Auto-encoding and distilling scene graphs for image captioning," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, May 2022, pp. 2313-2327.
- X. Lu and Y. Gao, "Guide and interact: SceneGraph based generation and control of video captions," Multimed. Syst., vol. 29, no. 2, Apr. 2023, pp. 797-809.
- C. Zhang et al., "An empirical study on leveraging scene graphs for visual question answering," in Proc. BMVC, Sept. 2019.
- L. Li et al., "Relation-aware graph attention network for visual question answering," in Proc. IEEE/CVF ICCV, Oct. 2019, pp. 10312-10321.
- J. Mao et al., "Dynamic multistep reasoning based on video scene graph for video question answering," in Proc. NAACL, Jul. 2022, pp. 3894-3904.
- M. Qi et al., "Online cross-modal scene retrieval by binary representation and semantic graph," in Proc. ACM MM, Oct. 2017, pp. 744-752.
- M. Daum et al., "VOCAL: Video organization and interactive compositional analytics," in Proc. CIDR, Jan. 2022.
- X. Chang et al., "A Comprehensive survey of scene graphs: generation and application," IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, 2023, pp. 1-26. https://doi.org/10.1109/TPAMI.2021.3137605
- O. Russakovsky et al., "ImageNet large scale visual recognition challenge," Int. J. Comput. Vis., vol. 115, no. 3, 2015, pp. 211-252. https://doi.org/10.1007/s11263-015-0816-y
- C. Liu et al., "Beyond short-term snippet: Video relation detection with spatio-temporal global context," in Proc. IEEE/CVF CVPR, June 2020, pp. 10837-10846.
- Y. Li et al., "Interventional video relation detection," in Proc. ACM MM, Oct. 2021, pp. 4091-4099.
- X. Shang et al., "Video visual relation detection," in Proc. ACM MM, Oct. 2017, pp. 1300-1308.
- A. Vaswani et al., "Attention is all you need," in Proc. NIPS, Dec. 2017, pp. 5998-6008.
- Y.H.H. Tsai et al., "Video relationship reasoning using gated spatio-temporal energy graph," in Proc. IEEE/CVF CVPR, June 2019, pp. 10416-10425.
- X. Qian et al., "Video relation detection with spatiotemporal graph," in Proc. ACM MM, Oct. 2019, pp. 84-93.
- T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," in Proc. ICLR, Apr. 2017.
- L. Bertinetto et al., "Fully-connected siamese networks for object tracking," in Proc. ECCVW, Oct. 2016, pp. 850-865.
- Q. Cao et al., "3-D relation network for visual relation recognition in videos," Neurocomputing, vol. 432, 2021, pp. 91-100. https://doi.org/10.1016/j.neucom.2020.12.029
- X. Shang et al., "Video visual relation detection via iterative inference," in Proc. ACM MM, Oct. 2021, pp. 3654-3663.
- S. Chen et al., "Social fabric: tubelet compositions for video relation detection," in Proc. IEEE/CVF ICCV, Oct. 2021, pp. 13465-13474.
- K. Gao et al., "Classification-then-grounding: Reformulating video scene graphs as temporal bipartite graphs," in Proc. IEEE/CVF CVPR, June 2022, pp. 19475-19484.
- C. Lu et al., "DEBUG: A dense bottom-up grounding approach for natural language video localization," in Proc. EMNLP-IJCNLP, Nov. 2019, pp. 5144-5153.
- Y. Teng et al., "Target adaptive context aggregation for video scene graph generation," in Proc. IEEE/CVF ICCV, Oct. 2021, pp. 13668-13677.
- Y. Cong et al., "Spatial-temporal transformer for dynamic scene graph generation," in Proc. IEEE/CVF ICCV, Oct. 2021, pp. 16352-16363.
- Y. Li et al., "Dynamic scene graph generation via anticipatory pre-training," in Proc. IEEE/CVF CVPR, June 2022, pp. 13864-13873.
- S. Feng et al., "Exploiting long-term dependencies for generating dynamic scene graphs," in Proc. IEEE/CVF WACV, Jan. 2023, pp. 5119-5128.
- S. Nag et al., "Unbiased Scene graph generation in videos," in Proc. IEEE/CVF CVPR, June 2023, pp. 22803-22813.
- L. Xu et al., "Meta spatio-temporal debiasing for video scene graph generation," in Proc. ECCV, Oct. 2022, pp. 374-390.
- X. Shang et al., "Annotating objects and relations in user-generated videos," in Proc. ACM ICMR, June 2019, pp. 279-287.
- B. Thomee et al., "YFCC100M: The new data in multimedia research," Commun. ACM, vol. 59, no. 2, 2016, pp. 64-73. https://doi.org/10.1145/2812802
- J. Ji et al., "Action genome: actions as compositions of spatio-temporal scene graphs," in Proc. IEEE/CVF CVPR, June 2020, pp. 10233-10244.
- G. A. Sigurdsson et al., "Hollywood in homes: Crowdsourcing data collection for activity understanding," in Proc. ECCV, Oct. 2016, pp. 510-526.
- J. Yang et al., "Panoptic video scene graph generation," in Proc. IEEE/CVF CVPR, June 2023, pp. 18675-18685.