Research on 3D Object Detection Based on Laser Point Cloud and Image Fusion

doi:10.3901/JME.2022.24.289

Abstract

Abstract: At present, 3D object detection based on the fusion of lidar and camera has received extensive attention. However, most fusion algorithms are difficult to accurately detect small target objects such as pedestrians and cyclists. Therefore, a feature fusion network based on the self-attention mechanism is proposed, which fully considers the local feature information to achieve accurate 3D object detection. Firstly, to reduce the spatial search range of the point cloud, the Faster-RCNN is improved to form a candidate box. Then, the frustum point cloud was extracted according to the projection relationship between the lidar and the camera. Secondly, a Self-Attention PointNet based on the self-attention mechanism is proposed to segment the original point cloud data within the scope of the frustum. Finally, while using the PointNet and T-Net to predict the 3D bounding box parameters, the regularization term is considered in the loss function to achieve higher convergence accuracy. The KITTI dataset is used for verification and testing. The results show that this method is obviously superior to F-PointNet and the detection accuracy of cars, pedestrians, and cyclists has been greatly improved, and it has higher accuracy than mainstream 3D object detection networks.

Key words: lidar, 3D object detection, point cloud fusion, attention mechanism, deep learning

CLC Number:

TG156

LIU Yonggang, YU Fengning, ZHANG Xinjie, CHEN Zheng, QIN Datong. Research on 3D Object Detection Based on Laser Point Cloud and Image Fusion[J]. Journal of Mechanical Engineering, 2022, 58(24): 289-299.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: http://www.cjmenet.com.cn/EN/10.3901/JME.2022.24.289

http://www.cjmenet.com.cn/EN/Y2022/V58/I24/289

References

[1] 薛培林, 吴愿, 殷国栋, 等.基于信息融合的城市自主车辆实时目标识别[J].机械工程学报, 2020, 56(12):165-173.XUE Peilin, WU Yuan, YIN Guodong, et al.Real-time target recognition of urban autonomous vehicles based on information fusion[J].Chinese Journal of Mechanical Engineering, 2020, 56(12):165-173.
[2] 彭育辉, 郑玮鸿, 张剑锋.基于深度学习的道路障碍物检测方法[J].计算机应用, 2020, 40(8):2428-2433.PENG Yuhui, ZHENG Weihong, ZHANG Jianfeng.Road obstacle detection method based on deep learning[J].Journal of Computer Applications, 2020, 40(8):2428-2433.
[3] WANG D L, POSNER I.Voting for voting in online point cloud object detection[C]//Robotics:Science and Systems Xi, Sapienza Univ Rome:MIT PRESS, 2015:13-22.
[4] ZHOU Yin, TUZEL O.VoxelNet:end-to-end learning for point cloud based 3D object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT:IEEE Comp Soc, 2018:4490-4499.
[5] YAN Yan, MAO Yuxing, LI Bo.SECOND:Sparsely embedded convolutional detection[J].Sensors, 2018, 18(10):3337-3354.
[6] KUANG Hongwu, WANG Bei, AN Jianping, et al.Voxel-FPN:Multi-scale voxel feature aggregation for 3d object detection from lidar point clouds[J].Sensors, 2020, 20(3):704-723.
[7] ENGELCKE M, RAO D, ZENG D, et al.Vote3Deep:fast object detection in 3d point clouds using efficient convolutional neural ntworks[C]//2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore:IEEE, 2017:1355-1361.
[8] LI B.3D fully convolutional network for vehicle detection in point cloud[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), Vancouver:IEEE, 2017:1513-1518.
[9] QI C R, SU Hao, MO Kaichun, et al.PointNet:Deep learning on point sets for 3d classification and segmentation[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu:IEEE, 2017:77-85.
[10] QI C R, YI Li, SU Hao, et al.PointNet plus plus:Deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of Advances in Neural Information Processing Systems 30, Long Beach CA:NIPS, 2017:5099-5108.
[11] LI Yangyan, BU Rui, SUN Mingchao, et al.PointCNN:Convolution on x-transformed points[C]//Proceedings of Advances in Neural Information Processing Systems 31, Montreal:NIPS, 2018:820-830.
[12] DENG Haowen, BIRDAL T, IlIE S, et al.PPFNet:Global context aware local features for robust 3D point matching[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT:IEEE, 2018:195-205.
[13] MEYER G P, LADDHA A, KEE E, et al.LaserNet:An efficient probabilistic 3D object detector for autonomous driving[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach CA:IEEE, 2019:12669-12678.
[14] YANG Zetong, SUN Yanan, LIU Shu, et al.3DSSD:point-based 3D single stage object detector[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle:IEEE, 2020:arXiv:2002.10187.
[15] LI Bo, ZHANG Tianlei, XIA Tian.Vehicle detection from 3 D lidar using fully convolutional network[C]//Proceedings of Robotics:Science and Systems (RSS), Ann Arbor:MIT PRESS, 2016:42-50.
[16] CHEN Xiaozhi, MA Huimin, WAN Ji, et al.Multi-view 3 D object detection network for autonomous driving[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu:IEEE, 2017:6526-6534.
[17] KU J, MOZIFIAN M, LEE J, et al.Joint 3d proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), Madrid:IEEE, 2018:5750-5757.
[18] QI C R, LIU Wei, WU Chenxia, et al.Frustum pointnets for 3D object detection from RGB-D data[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City UT:IEEE, 2018:918-927.
[19] WANG Zhixin, JIA Kui.Frustum convNet:sliding frustums to aggregate local point-wise features for amodal 3D object detection[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau:IEEE, 2019:1742-1749.
[20] LIANG Ming, YANG Bin, CHEN Yun, et al.Multi-task multi-sensor fusion for 3D object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach CA:IEEE, 2019:7337-7345.
[21] LIANG Ming, YANG Bin, WANG Shenlong, et al.Deep continuous fusion for multi-sensor 3D object detection[C]//15th European Conference on Computer Vision (ECCV), Munich:Springer-Verlag Berlin, 2018:663-678.
[22] REN Shaoqing, HE Kaiming, GIRSHICK R, et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 2016, 36(6):1137-1149.
[23] LIN T Y, DOLLAR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu:IEEE, 2017:936-944.
[24] WOO S, PARK J, LEE J Y, et al.CBAM:Convolutional block attention module[C]//15th European Conference on Computer Vision (ECCV), Munich:SPRINGER-VERLAG BERLIN, 2018:3-19.
[25] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems 30, Long Beach CA:NIPS, 2017:1049-1064.
[26] GEIGER A, LENZ P, STILLER C, et al.Vision meets robotics:The kitti dataset[J].International Journal of Robotics Research, 2013, 32(11):1231-1237.
[27] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al.Spatial transformer networks[C]//Proceedings of Advances in Neural Information Processing Systems 28, Montreal:NIPS, 2015:2017-2025.
[28] XU Danfei, ANGUELOV D, JAIN A.PointFusion:deep sensor fusion for 3d bounding box estimation[C]//31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City UT:IEEE, 2018:244-253.
[29] ZENG Yiming, HU Yu, LIU Shice, et al.RT3D:Real-time 3D vehicle detection in lidar point cloud for autonomous driving[J].IEEE Robotics And Automation Letters, 2018, 3(4):3434-3440.