Acknowledgement
This work was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP) under grant funded by the Korean government (MSIT) (No. 2021-0-00891, Development of AI Service Integrated Framework for Autonomous Driving) and the Korean government (MSIP) (No. 2020-0-00002, Development of Standard SW Platform-Based Autonomous Driving Technology to Solve Social Problems of Mobility and Safety for Marginalized Public Transport Communities).
References
- D. Choi, S.-J. Han, K.-W. Min, and J. Choi, PathGAN: Local path planning with attentive generative adversarial networks, ETRI J. 44 (2022), no. 6, 1004-1019. https://doi.org/10.4218/etrij.2021-0192
- S.-J. Han, J. Kang, K.-W. Min, and J. Choi, DiLO: Direct light detection and ranging odometry based on spherical range images for autonomous driving, ETRI J. 43 (2021), no. 4, 603-616. https://doi.org/10.4218/etrij.2021-0088
- J. Kang, S.-J. Han, N. Kim, and K.-W. Min, ETLi: Efficiently annotated traffic Lidar dataset using incremental and suggestive annotation, ETRI J. 43 (2021), no. 4, 630-639. https://doi.org/10.4218/etrij.2021-0055
- Y. Zhou and O. Tuzel, VoxelNet: End-to-end learning for point cloud based 3D object detection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA), 2018, pp. 4490-4499.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, nuScenes: A multimodal dataset for autonomous driving, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA), 2020, pp. 11621-11631.
- A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, Vision meets robotics: The KITTI dataset, Int. J. Robot. Res. 32 (2013), no. 11, 1231-1237. https://doi.org/10.1177/0278364913491297
- P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, Scalability in perception for autonomous driving: Waymo open dataset, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA), 2020, pp. 2446-2454.
- B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, D. Ramanan, P. Carr, and J. Hays, Argoverse 2: Next generation datasets for self-driving perception and forecasting, arxiv preprint, 2023. DOI 10.48550/arXiv.2301.00493
- J. Lambert, A. Carballo, A. M. Cano, P. Narksri, D. R. Wong, E. Takeuchi, and K. Takeda, Performance analysis of 10 models of 3D LiDARs for automated driving, IEEE Access 8 (2020), 131699-131722. https://doi.org/10.1109/ACCESS.2020.3009680
- Q. Xu, Y. Zhou, W. Wang, C. R. Qi, and D. Anguelov, SPG: Unsupervised domain adaptation for 3D object detection via semantic point generation, (IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada), 2021, pp. 15446-15456.
- X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C.-L. Tai, TransFusion: Robust Lidar-camera fusion for 3D object detection with transformers, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA), 2022, pp. 1090-1099.
- X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, Multi-View 3D object detection network for autonomous driving, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 1907-1915.
- Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, BEVFusion: Multi-task multi-sensor fusion with unified bird's-eye view representation, (IEEE International Conference on Robotics and Automation (ICRA)), London, UK), 2023.
- AIHub, 2023. Available from: https://aihub.or.kr/ [last accessed February 2023].
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 652-660.
- Y. Yan, Y. Mao, and B. Li, SECOND: Sparsely embedded convolutional detection, Sensors 18 (2018), no. 10, 3337.
- A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, PointPillars: Fast encoders for object detection from point clouds, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA), 2019, pp. 12697-12705.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, PyTorch: An imperative style, high-performance deep learning library, (International Conference on Neural Information Processing Systems (NIPS)), Vancouver, Canada, 2019, pp. 8026-8037.
- NVIDIA TensorRT, 2023. Available from: https://developer.nvidia.com/tensorrt [last accessed February 2023].
- Y. Zhou, P. Sun, Y. Zhang, D. Anguelov, J. Gao, T. Y. Ouyang, J. Guo, J. Ngiam, and V. Vasudevan, End-to-end multi-view fusion for 3D object detection in LiDAR point clouds, (Conference on Robot Learning (CoRL)), Osaka, Japan), 2019, pp. 923-932.
- T. Yin, X. Zhou, and P. Krahenbuhl, Center-based 3D object detection and tracking, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA), 2021, pp. 11784-11793.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM 60 (2017), no. 6, 84-90. https://doi.org/10.1145/3065386
- K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, (International Conference on Learning Representations (ICLR), CA, USA), 2015.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA), 2015, pp. 1-9.
- K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA), 2016, pp. 770-778.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ImageNet: A large-scale hierarchical image database, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA), 2009, pp. 248-255.
- M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) challenge, Int. J. Robot. Res. 88 (2010), no. 2, 303-338.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, Microsoft COCO: Common objects in context, European Conference on Computer Vision (ECCV), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, (eds.), Springer International Publishing, Zurich, Switzerland, 2014, pp. 740-755.
- G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 4700-4708.
- Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, A ConvNet for the 2020s, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA), 2022, pp. 11976-11986.
- S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 1492-1500.
- M. Tan and Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, (International Conference on Machine Learning(ICML), California, USA), 2019, pp. 6105-6114.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv Preprint, 2017. http://arxiv.org/abs/1704.04861
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, An image is worth 1616 words: Transformers for image recognition at scale, (International Conference on Learning Representations (ICLR), Vienna, Austria), 2021.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, (IEEE/CVF International Conference on Computer Vision (ICCV)), 2021, pp. 10012-10022.
- D. Lee, D.-W. Kang, J. Kang, J.-Y. Kim, K.-W. Min, J.-H. Park, K.-B. Sung, Y.-S. Song, T.-H. An, Y.-W. Jo, D. Choi, J.-D. Choi, and S.-J. Han, Apparatus for recognizing object of automated driving system using error removal based on object classification and method using the same, Tech. Report US11507783B2. U.S. Patent. USA, 2022.
- P. Xiao, S. Shao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, Y. Wang, and D. Yang, PandaSet: Advanced sensor suite dataset for autonomous driving, (IEEE Intelligent Transportation Systems Conference (ITSC), Indianapolis, USA), 2021, pp. 3095-3101.
- E. Li, S. Wang, C. Li, D. Li, X. Wu, and Q. Hao, SUSTech POINTS: A portable 3D point cloud interactive annotation platform system, (IEEE Intelligent Vehicles symposium (IV)), Nevada, USA), 2020, pp. 1108-1115.
- H. Jung, Sensor fusion multi-object tracking and prediction data, 2021. Available from: https://aihub.or.kr/ [last accessed February 2023].
- M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, The Pascal Visual Object Classes challenge: A retrospective, Int. J. Comput. Vis. 111 (2014), 98-136. https://doi.org/10.1007/s11263-014-0733-5
- L. Du, X. Ye, X. Tan, E. Johns, B. Chen, E. Ding, X. Xue, and J. Feng, AGO-Net: Association-guided 3D point cloud object detection network, IEEE Trans. Pattern Anal. Machine Intell. (PAMI) 44 (2022), no. 11, 8097-8109.
- I. Loshchilov and F. Hutter, Decoupled weight decay regularization, (International Conference on Learning Representations (ICLR), LA, USA), 2019.
- A. Gupta, P. Dollar, and R. Girshick, LVIS: A dataset for large vocabulary instance segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), California, USA), 2019, pp. 5356-5364.
- I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, (International Conference on Machine Learning (ICML), Georgia, USA), 2013, pp. 1139-1147.
- K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, The balanced accuracy and its posterior distribution, (International Conference on Pattern Recognition (ICPR), Istanbul, Turkey), 2010, pp. 3121-3124.
- Y. Wang, X. Chen, Y. You, L. E. Li, B. Hariharan, M. Campbell, K. Q. Weinberger, and W.-L. Chao, Train in Germany, Test in the USA: Making 3D object detectors generalize, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA), 2020, pp. 11713-11723.
- J. Yang, S. Shi, Z. Wang, H. Li, and X. Q, ST3D++: Denoised self-training for unsupervised domain adaptation on 3D object detection, IEEE Trans. Pattern Anal. Machine Intell. 45 (2023), no. 5, 6354-6371.
- OpenPCDet: An open-source toolbox for 3D object detection from point clouds, 2020. Available from: https://github.com/open-mmlab/OpenPCDet [last accessed July 2023].
- E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, Randaugment: Practical automated data augmentation with a reduced search space, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, USA), 2020, pp. 702-703.
- J. Ngiam, B. Caine, W. Han, B. Yang, Y. Chai, P. Sun, Y. Zhou, X. Yi, O. Alsharif, P. Nguyen, Z. Chen, J. Shlens, and V. Vasudevan, StarNet: Targeted computation for object detection in point clouds. arXiv Preprint, 2019, https://arxiv.org/abs/1908.11069
- Z. Yang, Y. Zhou, Z. Chen, and J. Ngiam, 3DMAN: 3D multi-frame attention network for object detection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual)), 2021, pp. 1863-1872.