DOI QR코드

DOI QR Code

3D Object Detection via Multi-Scale Feature Knowledge Distillation

  • Se-Gwon Cheon (Vision & Learning Lab, Dept. of Electrical and Computer Engineering, Inha University) ;
  • Hyuk-Jin Shin (Vision & Learning Lab, Dept. of Electrical and Computer Engineering, Inha University) ;
  • Seung-Hwan Bae (Vision & Learning Lab, Dept. of Electrical and Computer Engineering, Inha University)
  • 투고 : 2024.07.16
  • 심사 : 2024.10.04
  • 발행 : 2024.10.31

초록

본 연구에서는 모델의 경량화를 위해 교사 모델의 출력 특징맵에서 3D 객체의 정보를 추출해 학생 모델의 다중 스케일 특징맵(Multi-scale feature map)에 맞게 증류하는 3D 객체 검출용 다중스케일 특징 지식 증류 기법인 M3KD (Multi-Scale Feature Knowledge Distillation for 3D Object Detection)를 제안한다. M3KD는 지식 증류 수행 시 학생 모델과 교사 모델의 다중 스케일 특징맵들 간 L2 손실(loss)을 사용해 특징맵 값의 차이를 줄이게 함으로써 학생 모델이 교사 모델의 백본을 모방하게 하여 학생 모델의 전체적인 정확도를 향상시키고, 기존의 이미지 분류 태스크(Task)에서 사용하는 클래스 로짓(Logits) 지식 증류를 적용해 교사 모델의 클래스 분류 로짓을 모방함으로써 학생 모델의 검출 정확도를 향상시킨다. 본 연구가 제안한 M3KD의 효과를 증명하기 위해 KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) 데이터 셋에서 실험을 진행하였으며, 이때 학습한 학생 모델이 교사 모델 대비 30%의 추론 속도 향상을 달성하였다. 또한, 정확도에서 기존의 학생 모델과 비교시 모든 클래스 및 모든 난이도에서 평균적으로 1.08%의 3D mAP (Mean Average Precision) 향상이 있음을 확인하였다. 또한 최신 지식 증류 기법인 PKD, SemCKD에 제안하는 기법을 추가로 적용하였을 시 기존 대비 0.42%, 0.52% 높은 정확도 (3D mAP)를 나타내 성능 향상을 달성하였다.

In this paper, we propose Multi-Scale Feature Knowledge Distillation for 3D Object Detection (M3KD), which extracting knowledge from the teacher model, and transfer to the student model consider with multi-scale feature map. To achieve this, we minimize L2 loss between feature maps at each pyramid level of the student model with the correspond teacher model so student model can mimic the teacher model backbone information which improves the overall accuracy of the student model. We apply the class logits knowledge distillation used in the image classification task, by allowing student model mimic the classification logits of the teacher model, to guide the student model to improve the detection accuracy. In KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset, our M3KD (Multi-Scale Feature Knowledge Distillation for 3D Object Detection) student model achieves 30% inference speed improvement compared to the teacher model. Additionally, our method achieved an average improvement of 1.08% in 3D mean Average Precision (mAP) across all classes and difficulty levels compared to the baseline student model. Furthermore, when integrated with the latest knowledge distillation methods such as PKD and SemCKD, our approach achieved an additional 0.42% and 0.52% improvement in 3D mAP, respectively, further enhancing performance.

키워드

과제정보

This work was supported in part by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (No. NRF-2022R1C1C1009208) and funded by the Ministry of Education (No.2022R1A6A1A03051705); supported in part by Institute of Information & communications Technology Planning & Evaluation (IITP) grants funded by the Korea government (MSIT) (No.2022-0-00448/RS-2022-II220448: Deep Total Recall, 30%, No.RS-2022-00155915: Artificial Intelligence Convergence Innovation Human Resources Development (Inha University))

참고문헌

  1. Mao Jiageng, "3D object detection for autonomous driving: A comprehensive survey," International Journal of Computer Vision, Vol. 131, No. 8, pp. 1909-1963, August 2023. DOI: 10.1007/S11263-023-01790-1
  2. He Kaiming, "Deep residual learning for image recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, Las Vegas, NV, USA, June 2016. DOI: 10.1109/CVPR.2016.90
  3. Zhou Shengchao, "UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5116-5125, Vancouver, BC, Canada, June 2023. DOI: 10.1109/CVPR52729.2023.00495
  4. Chong Zhiyu, "Monodistill: Learning spatial features for monocular 3d object detection," arXiv preprint arXiv:2201.10830 Vol. abs/2201.10830, 2022.
  5. Zeng Jia, "Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 992-1001, Vancouver, BC, Canada, June 2023. DOI: 10.1109/CVPR52729.2023.00102
  6. Chen Defang, "Knowledge distillation with the reused teacher classifier," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11923-11932, New Orleans, LA, USA, June 2022. DOI: 10.1109/CVPR52688.2022.01163
  7. G. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network," arXiv, March 2015. DOI: 10.48550/arXiv.1503.02531
  8. Andreas Geiger, Philip Lenz, and Raquel Urtasun, "Are we ready for autonomous driving? the KITTI vision benchmark suite," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354-3361, Providence, RI, USA, June 2012. DOI: 10.1109/CVPR.2012.6248074
  9. Cao Weihan, "Pkd: General distillation framework for object detectors via pearson correlation coefficient," Advances in Neural Information Processing Systems 35, pp. 15394-15406, New Orleans, LA, USA, November 2022.
  10. Wang Can, "SemCKD: Semantic calibration for cross-layer knowledge distillation," IEEE Transactions on Knowledge and Data Engineering, Vol. 35, No. 8, pp. 6305-6319, June 2023: 6305-6319. DOI: 10.1109/TKDE.2022.3171571
  11. Seung-Hwan Bae, "Deformable Part Region Learning and Feature Aggregation Tree Representation for Object Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, pp. 10817-10834, September 2023. DOI: 10.1109/TPAMI.2023.3268864
  12. Seung-Hwan Bae, "Deformable part Region Learning for object detection," Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 1, pp. 95-103, 2022. DOI:10.1609/AAAI.V36I1.19883
  13. Seong-Ho Lee, and Seung-Hwan Bae, "AFI-GAN: Improving feature interpolation of feature pyramid networks via adversarial training for object detection," Pattern Recognition, Vol. 138, pp. 1-14, June 2023. DOI: 10.1016/J.PATCOG.2023.109365
  14. Lang Alex H, "Pointpillars: Fast encoders for object detection from point clouds," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697-12705, Long Beach, CA, USA, June 2019. DOI: 10.1109/CVPR.2019.01298
  15. Brazil, Garrick, and Xiaoming Liu, "M3d-rpn: Monocular 3d region proposal network for object detection," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9286-9295, Seoul, Korea, October 2019. DOI: 10.1109/CVPR.2019.01298
  16. Shi Xuepeng, "Geometry-based distance decomposition for monocular 3d object detection," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15152-15161, Montreal, QC, Canada, October 2021. DOI: 10.1109/ICCV48922.2021.01489
  17. Lin Tsung-Yi, "Feature pyramid networks for object detection," Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 2117-2125, Hawaii, USA, July 2017. DOI: 10.1109/CVPR.2017.106
  18. Ren Shaoqing, "Faster R-CNN: Towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 6, pp. 1137-1149, June 2017, DOI: 10.1109/TPAMI.2016.2577031
  19. Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun, "Monocular 3d object detection for autonomous driving," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2147-2156, Las Vegas, NV, USA, June 2016. DOI: 10.1109/CVPR.2016.236