DOI QR코드

DOI QR Code

Human-Object Interaction Detection Data Augmentation Using Image Concatenation

이미지 이어붙이기를 이용한 인간-객체 상호작용 탐지 데이터 증강

  • 이상백 (충남대학교 컴퓨터융합학부) ;
  • 이규철 (충남대학교 컴퓨터융합학부)
  • Received : 2022.08.30
  • Accepted : 2022.10.21
  • Published : 2023.02.28

Abstract

Human-object interaction(HOI) detection requires both object detection and interaction recognition, and requires a large amount of data to learn a detection model. Current opened dataset is insufficient in scale for training model enough. In this paper, we propose an easy and effective data augmentation method called Simple Quattro Augmentation(SQA) and Random Quattro Augmentation(RQA) for human-object interaction detection. We show that our proposed method can be easily integrated into State-of-the-Art HOI detection models with HICO-DET dataset.

인간-객체 상호작용 탐지는 객체 탐지와 상호작용 인식을 함께 풀어야하는 분야로 탐지 모델의 학습을 위해서 많은 데이터를 필요로 한다. 현재 공개된 데이터셋은 규모가 부족하여 데이터 증강 기법에 대한 요구가 커지고 있으나, 대부분의 연구에서 기존의 객체 탐지, 이미지 분할분야에서 활용하는 증강 기법을 활용하고 있는 실정이다. 이에 본 연구에서는 인간-객체 상호작용 탐지 분야에서 활용하는 데이터셋의 특성을 파악하고, 이를 통해 인간-객체 상호작용 탐지 모델 성능 향상에 효과적인 데이터 증강 기법을 제안한다. 본 연구에서 제안한 증강 기법에 대한 검증을 위하여 실험 환경을 구축하고, 기존의 학습 모델에 적용하여 증강 기법을 적용할 경우에 탐지 모델의 성능 향상이 가능함을 확인하였다.

Keywords

Acknowledgement

이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No.2022-0-00817).

References

  1. I. Kostrikov, D. Yarats, and R. Fergus. "Image augmentation is all you need: Regularizing deep reinforcement learning from pixels," arXiv preprint arXiv:2004.13649, 2020.
  2. Z. Li, C. Zou, Y. Zhao, B. Li, and S. Zhong, "Improving human-object interaction detection via phrase learning and label composition," arXiv preprint arXiv:2112.07383, 2021.
  3. H. S. Fang, Y. Xie, D. Shao, Y. L. Li, and C. Lu, "DecAug: Augmenting HOI detection via decomposition," Proceedings of the AAAI Conference on Artificial Intelligence, Vol.35, No.2, pp.1300-1308, 2021.
  4. Y. W. Chao, Y. Liu, X. Liu, H. Zeng, and J. Deng, "Learning to detect human-object interactions," 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018.
  5. S. Gupta and J. Malik, "Visual semantic role labeling," arXiv preprint arXiv:1505.04474, 2015.
  6. B. Zhuang, Q. Wu, C. Shen, I. Reid, and A. Hengel, "HCVRD: A benchmark for large-scale human-centered visual relationship detection," Proceedings of the AAAI Conference on Artificial Intelligence, Vol.32, No.1, 2018.
  7. H. S. Fang, J. Sun, R. Wang, M. Gou, Y. L. Li, and C. Lu, "Instaboost: Boosting instance segmentation via probability map guided copy- pasting," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
  8. T. Y. Lin et al., "Microsoft coco: Common objects in context," European Conference on Computer Vision, Springer, Cham, 2014.
  9. F. Z. Zhang, D. Campbell, and S. Gould, "Efficient two-stage detection of human-object interactions with a novel unary-pairwise transformer," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  10. G. Ghiasi et al., "Simple copy-paste is a strong data augmentation method for instance segmentation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  11. S. Li, K. Gong, C. H. Liu, Y. Wang, F. Qiao, and X. Cheng, "Metasaug: Meta semantic augmentation for long-tailed visual recognition," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  12. F. Z. Zhang, D. Campbell, and S. Gould. "Spatially conditioned graphs for detecting human-object interactions," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
  13. C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of Big Data, Vol.6, No.1, pp.1-48, 2019. https://doi.org/10.1186/s40537-018-0162-3
  14. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable detr: Deformable transformers for end-to-end object detection," arXiv preprint arXiv:2010.04159, 2020.
  15. Z. Li, C. Zou, Y. Zhao, B. Li, and S. Zhong, "Improving human-object interaction detection via phrase learning and label composition," Proceedings of the AAAI Conference on Artificial Intelligence, Vol.36, No.2, 2022.
  16. A. Zhang et al., "Mining the benefits of two-stage and one-stage hoi detection," Advances in Neural Information Processing Systems, Vol.34, pp.17209-17220, 2021.
  17. Y. W. Chao, Z. Wang, Y. He, J. Wang, and J. Deng, "Hico: A benchmark for recognizing human-object interactions in images," Proceedings of the IEEE International Conference on Computer Vision, 2015.