Voxel Feature Attention-based Point Cloud Object Detection Algorithm for Traffic Cone

doi:10.3901/JME.2025.16.239

Abstract

Abstract: Traffic cone barrels serve as important markers for defining the boundaries of drivable areas on roads, and precise and efficient detection of traffic cone barrels is of significant importance for the safe navigation of autonomous vehicles. This paper presents a voxel feature attention-based point cloud object detection algorithm for traffic cone barrels, named AttenPillar, aiming to address the issues of weak feature extraction capabilities and the inability to focus on key spatial information in existing methods, resulting in poor robustness and accuracy. PointPillars is used as the baseline model, and an encoder-decoder structure is adopted as the backbone network, while the ReLU activation function is replaced by the GELU (Gaussian Error Linear Unit) activation function. To better capture features, a PillarWise hybrid domain attention mechanism module is proposed, which can aggregate the features of points in voxels and use the hybrid domain attention mechanism to generate attention tensors. By performing hybrid domain attention operations between the input and output feature layers of the network backbone and the attention tensors, the spatial geometric information loss during the downsampling process can be reduced, and the weights of non-empty voxels in the output feature map can be increased. This allows the network to focus more on the non-empty voxel part, thereby fully extracting the point cloud features in voxels with different point cloud quantities. Finally, the detection head outputs the 3D prediction boxes. A traffic cone point cloud dataset is collected and constructed for algorithm verification, and the proposed algorithm achieves 10.90% and 14.41% improvements in BEV AP (IoU=0.7) and 3D AP (IoU=0.7) respectively compared to the baseline model PointPillars, with a speed of 78.86 FPS on the embedded computing device NVIDIA AGX Xavier, effectively improving the accuracy of cone detection while ensuring real-time performance.

Key words: traffic cone detection, 3D object detection, attention mechanism, small object detection

CLC Number:

U495
TP391

LIAN Qiuyou, ZHENG Shaowu, TU Xinkui, LI Weihua. Voxel Feature Attention-based Point Cloud Object Detection Algorithm for Traffic Cone[J]. Journal of Mechanical Engineering, 2025, 61(16): 239-249.

References

[1] QI C R, SU H, MO K, et al. Pointnet: Deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 652-660.
[2] QI C R, YI L, SU H, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space[J]. Advances in Neural Information Processing Systems, 2017, 30: 5099-5108.
[3] SHI S, WANG X, LI H. Pointrcnn: 3D object proposal generation and detection from point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 770-779.
[4] YANG Z, SUN Y, LIU S, et al. Std: Sparse-to-dense 3D object detector for point cloud[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1951-1960.
[5] 刘永刚,于丰宁,章新杰,等. 基于激光点云与图像融合的 3D 目标检测研究[J]. 机械工程学报, 2022, 58(24): 289-299. LIU Yonggang, YU Fengning, ZHANG Xinjie, et al. Research on 3D object detection based on laser point cloud and image fusion[J]. Journal of Mechanical Engineering, 2012, 58(24): 289-299.
[6] SHI W, RAJKUMAR R. Point-GNN: Graph neural network for 3D object detection in a point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1711-1719.
[7] MONTI F, BOSCAINI D, MASCI J, et al. Geometric deep learning on graphs and manifolds using mixture model cnns[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 5115-5124.
[8] ZHOU Y, TUZEL O. Voxelnet: End-to-end learning for point cloud based 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4490-4499.
[9] YAN Y, MAO Y, LI B. Second: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[10] LANG A H, VORA S, CAESAR H, et al. Pointpillars: Fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 12697-12705.
[11] YIN T, ZHOU X, KRAHENBUHL P. Center-based 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11784-11793.
[12] WANG Y, FATHI A, KUNDU A, et al. Pillar-based object detection for autonomous driving[C]//Computer Vision-ECCV 2020 : 16th European Conference , Glasgow, UK, August 23-28, 2020, Proceedings, Part XXⅡ 16. Springer International Publishing, 2020: 18-34.
[13] 郑少武,李巍华,陈泽涛,等. 一种基于多线激光雷达的赛道锥桶检测及目标点追踪方法: CN110780305A [P]. 2020-02-11. ZHENG Shaowu, LI Weihua, CHEN Zetao, et al. A method of track cone barrel detection and target point tracking based on multi-line laser radar: CN110780305A [P]. 2020-02-11.
[14] 黄瑞钦,梁洪波,李强,等. 基于改进欧氏聚类的锥桶检测方法与试验[J]. 应用激光, 2022, 42(10): 126-134. HUANG Ruiqin, LIANG Hongbo, LI Qiang, et al. Based on improved Euclidean cluster cone drum test method and test[J]. Applied Laser, 2022, 42(10) : 126-134.
[15] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The kitti dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.
[16] SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2446-2454.
[17] KATSAMENIS I, KAROLOU E E, DAVRADOU A, et al. TraCon: A novel dataset for real-time traffic cones detection using deep learning[C]//Novel & Intelligent Digital Systems Conferences. Cham: Springer International Publishing, 2022: 382-391.
[18] LI E, WANG S, LI C, et al. Sustech points: A portable 3d point cloud interactive annotation platform system[C]//2020 IEEE Intelligent Vehicles Symposium (IV). New York: IEEE, 2020: 1108-1115.
[19] QUIGLEY M, CONLEY K, GERKEY B, et al. ROS: An open-source robot operating system[C]//ICRA Workshop on Open Source Software. 2009, 3(3.2): 5.
[20] FISCHLER M A , BOLLES R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981, 24(6): 381-395.
[21] BENTLEY J L. Multidimensional binary search trees used for associative searching[J]. Communications of the ACM, 1975, 18(9): 509-517.
[22] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[23] IOFFE S , SZEGEDY C. Batch normalization : Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning. PMLR, 2015: 448-456.
[24] HAN J, MORAGA C. The influence of the sigmoid function parameters on the speed of backpropagation learning[C]//From Natural to Artificial Neural Computation: International Workshop on Artificial Neural Networks Malaga-Torremolinos, Spain, June 7-9, 1995 Proceedings 3. Springer Berlin Heidelberg , 1995 : 195-201.
[25] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE, 2017: 2980-2988.
[26] SHI S, WANG Z, SHI J, et al. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(8): 2647-2664.
[27] DENG J, SHI S, LI P, et al. Voxel R-CNN: Towards high performance voxel-based 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(2): 1201-1209.
[28] SHI S, GUO C, JIANG L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10529-10538.
[29] WU H, WEN C, LI W, et al. Transformation-equivariant 3D object detection for autonomous driving[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(3): 2795-2802.
[30] WU H, DENG J, WEN C, et al. CasA: A cascade attention network for 3D object detection from lidar point clouds[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-11.
[31] XIA Q, CHEN Y, CAI G, et al. 3D HANet: A flexible 3D heatmap auxiliary network for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-13.
[32] YANG H, HE T, LIU J, et al. GD-MAE: Generative decoder for mae pre-training on lidar point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 9403-9414.
[33] HU J , SHEN L , SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 7132-7141.
[34] WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19.
[35] ZHANG Q L, YANG Y B. Sa-net: Shuffle attention for deep convolutional neural networks[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York: IEEE, 2021: 2235-2239.
[36] HUANG Z, WANG X, HUANG L, et al. Ccnet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2019: 603-612.
[37] YANG L, ZHANG R Y, LI L, et al. Simam: A simple, parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning. PMLR, 2021: 11863-11874.
[38] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 13713-13722.
[39] WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11534-11542.