DOI QR코드

DOI QR Code

High-Frequency Interchange Network for Multispectral Object Detection

다중 스펙트럼 객체 감지를 위한 고주파 교환 네트워크

  • Park, Seon-Hoo (Department of SW Engineering, Chonnam National University) ;
  • Yun, Jun-Seok (Department of AI Convergence, Chonnam National University) ;
  • Yoo, Seok Bong (Department of AI Convergence, Chonnam National University) ;
  • Han, Seunghwoi (School of Mechanical Engineering, Chonnam National University)
  • Received : 2022.07.13
  • Accepted : 2022.08.05
  • Published : 2022.08.31

Abstract

Object recognition is carried out using RGB images in various object recognition studies. However, RGB images in dark illumination environments or environments where target objects are occluded other objects cause poor object recognition performance. On the other hand, IR images provide strong object recognition performance in these environments because it detects infrared waves rather than visible illumination. In this paper, we propose an RGB-IR fusion model, high-frequency interchange network (HINet), which improves object recognition performance by combining only the strengths of RGB-IR image pairs. HINet connected two object detection models using a mutual high-frequency transfer (MHT) to interchange advantages between RGB-IR images. MHT converts each pair of RGB-IR images into a discrete cosine transform (DCT) spectrum domain to extract high-frequency information. The extracted high-frequency information is transmitted to each other's networks and utilized to improve object recognition performance. Experimental results show the superiority of the proposed network and present performance improvement of the multispectral object recognition task.

RGB 이미지를 활용하는 다양한 객체 인식 분야에서 조도가 어둡거나 특정 물체에 의해 가려진 환경에서의 RGB 이미지는 객체 인식 성능 저하를 일으킨다. IR 이미지는 가시광선이 아닌 적외선 파동을 감지하기 때문에 이러한 환경에서 강인한 객체 인식 성능을 가질 수 있고, RGB-IR 이미지 쌍을 가지고 각자의 강점을 결합 하는 것을 통해 객체 인식 성능을 향상시킬 수 있다. 본 논문에서는 RGB-IR 이미지 쌍의 강점만을 결합하여 객체 인식 성능을 향상시키는 다중 스펙트럼 융합 모델인 high-frequency interchange network (HINet)을 제안한다. HINet은 RGB-IR 이미지 간 주요 정보를 교환하기 위해 두 가지 객체 인식 모델을 mutual high-frequency transfer (MHT)를 이용하여 연결하였다. MHT에서는 RGB-IR 이미지 쌍 각각을 discrete cosine transform (DCT) 스펙트럼 도메인으로 변환하여 고주파 정보를 추출한다. 추출된 고주파 정보는 서로의 네트워크에 전달되어 객체 인식성능 향상을 위해 활용되어 진다. 실험 결과는 제안하는 네트워크의 우수성을 보이며 다중 스펙트럼 객체 인식 성능을 향상시키는 것을 확인할 수 있다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(NRF-2020R1G1A1100798) and vehicle AI Convergence Research & Development Program through the National IT Industry Promotion Agency of Korea(NIPA) funded by the Ministry of Science and ICT(No. S0315-21-1001)

References

  1. J. Liu, S. Zhang, S. Wang, D. N. Metaxas, "Multispectral Deep Neural Networks for Pedestrian Detection," in Proceedings of the British Machine Vision Conference, York, U.K, pp. 13, 2016.
  2. K. Park, S. Kim, and K. Sohn, "Unified multi-spectral pedestrian detection based on probabilistic fusion networks," Pattern Recognition, vol. 80, pp. 143-155, Aug. 2018. https://doi.org/10.1016/j.patcog.2018.03.007
  3. C. Li, D. Song, R. Tong, and M. Tang, "Multispectral Pedestrian Detection Via Simultaneous Detection and Segmentation," in Proceedings of the British Machine Vision Conference, Hangzhou, China, pp. 225, 2018.
  4. C. Li, D. Song, R. Tong, and M. Tang, "Illumination-aware faster R-CNN for robust multispectral pedestrian detection," Pattern Recognition, vol. 85, pp. 161-171, Jan. 2019. https://doi.org/10.1016/j.patcog.2018.08.005
  5. L. Zhang, Z. Liu, S. Zhang, X. Yang, H. Qiao, K. Huang, and A. Hussain, "Cross-modality interactive attention network for multispectral pedestrian detection," Information Fusion, vol. 50, pp. 20-29, 2019. https://doi.org/10.1016/j.inffus.2018.09.015
  6. H. Zhang, E. Fromont, S. Lefevre, and B. Avignon, "Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks," in Proceedings of the International Conference on Image Processing, Abu Dhabi, United Arab Emirates, pp. 276-280, 2020.
  7. H. Zhang, E. Fromont, S. Lefevre, and B. Avignon, "Guided Attentive Feature Fusion for Multispectral Pedestrian Detection," in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa: HI, USA, pp. 72-80, 2021.
  8. Y. Chen, J. Shi, Z. Ye, C. Mertz, D. Ramanan, and S. Kong, "Multimodal object detection via bayesian fusion," arXiv preprint arXiv:2104.02904, 2021.
  9. M. Sharma, M. Dhanaraj, S. Karnam, D. G. Chachlakis, R. Ptucha, P. P. Markopoulos, and E. Saber, "YOLOrs: sObject detection in multimodal remote sensing imagery," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 1497-1508, Nov. 2021. https://doi.org/10.1109/JSTARS.2020.3041316
  10. H. Perreault, G. Bilodeau, N. Saunier, and M. Heritier, "FFAVOD: Feature fusion architecture for video object detection," Pattern Recognition Letter, vol. 151, pp. 294-301, Nov. 2021. https://doi.org/10.1016/j.patrec.2021.09.002
  11. R. Guo, D. Li, and Y. Han, "Deep multi-scale and multi-modal fusion for 3d object detection," Pattern Recognition Letter, vol. 151, pp. 236-242, Nov. 2021. https://doi.org/10.1016/j.patrec.2021.08.028
  12. F. Team, et al., Free flir thermal dataset for algorithm training [Internet]. Available: https://www.flir.com/oem/adas/adas-dataset- form/.
  13. X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, "LLVIP: A Vi sible-infrared Paired Dataset for Low-light Vision," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3496-3504, Montreal: BC, Canada, 2021.
  14. G. Jocher, K. Nishimura, T. Mineeva, R. Vilarino, YOLOv5, Jul, 2020 [Internet]. Available: https://github.com/ultralytics/yolov5.
  15. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: NV, USA, pp. 770-778, 2016.
  16. S. J. Lee, J. S. Yun, and S. B. Yoo, "Alternative collaborative learning for character recognition in low-resolution images," IEEE Access, vol. 10, pp. 22003- 22017, Feb. 2022. https://doi.org/10.1109/ACCESS.2022.3153116
  17. R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colombus: OH, USA, pp. 580-587, 2014.
  18. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017. https://doi.org/10.1109/TPAMI.2016.2577031
  19. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg, "SSD: Single Shot Multibox Detector," in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, vol. 9905, pp. 21-37, 2016.
  20. J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: NV, USA, pp. 779-788, 2016.
  21. S. J. Lee and S. B. Yoo, "Super-resolved recognition of license plate characters," Mathematics, vol. 9, no. 19, pp. 2494(1)-2494(19), Oct. 2021.
  22. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," in Proceedings of the Neural Information Processing Systems, pp. 5998-6008, 2017.
  23. F. Qingyun, H. Dapeng, and W. Zhaokui, "Cross-Modality Fusion Transformer for Multispectral Object Detection," arXiv preprint arXiv:2111.00273, 2021.
  24. J. Wagner, V. Fischer, M. Herman, and S. Behnke, "Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks," in Proceedings of the 24th European Symposium on Artificial Neural Networks, Bruges, Belgium, pp. 509-514, 2016.
  25. L. Zhang, X. Zhu, X. Chen, X. Yang, Z. Lei, and Z. Liu, "Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seou, Korea, pp. 5126-5136, 2019.
  26. Y. Zheng, I. H. Izzat, and S. Ziaee, "GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection," arXiv preprint arXiv:1903.06999, 2019.
  27. J. S. Yun, S. J. Lee, S. B. Yoo, and S. Han, "Hybrid-Domain High-Frequency Attention Network for Arbitrary Magnification Super-Resolution," Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 11, pp. 1477-1485, 2021. https://doi.org/10.6109/JKIICE.2021.25.11.1477
  28. J. S. Yun and S. B. Yoo, "Single Image Super-Resolution with Arbitrary Magnification Based on High-Frequency Attention Network," Mathematics, vol. 10, no. 2, pp. 275(1)-275(19), Jan. 2022.
  29. Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, "Image Super-Resolution Using Very Deep Residual Channel Attention Networks," in Proceedings of the European conference on computer vision, Salt Lake City: UT, USA, pp. 286-301, 2018.