基于体素特征注意力的交通锥桶点云目标检测算法

doi:10.3901/JME.2025.16.239

摘要/Abstract

摘要： 交通锥桶是道路可行驶区域边界的重要标识之一，精确高效的交通锥桶检测对自动驾驶汽车安全行驶具有重要意义。针对现有交通锥桶点云目标检测方法存在特征提取能力弱，无法关注到空间中的关键信息而导致鲁棒性和精度较差的问题，提出一种基于体素特征注意力的交通锥桶点云目标检测算法，称为AttenPillar。以PointPillars作为基线模型，并采用编码器-解码器结构作为主干网络，同时将ReLU激活函数替换为GELU(Gaussian error linear unit)激活函数。为了更好地捕捉特征，提出混合域注意力机制模块PillarWise，它能够聚合体素中点的特征，并利用混合域注意力机制生成注意力张量。通过在网络主干的输入和输出特征层与注意力张量之间进行混合域注意力操作，能够减少下采样过程中的空间几何信息损失，并增大输出特征图中非空体素的权重。这使得网络能够更关注非空体素部分，从而充分提取不同点云数量的体素中的点云特征。最后通过检测头输出三维预测框。采集并构建交通锥桶点云数据集进行算法验证，所提算法相较基线模型PointPillars在BEV AP(IoU=0.7)、3D AP(IoU=0.7)分别提升10.90%、14.41%，在嵌入式计算设备NVIDIA AGX Xavier上速度可达78.86帧/s，在保证实时性的前提下，有效提高了锥桶检测精度。

关键词: 交通锥桶检测, 3D目标检测, 注意力机制, 小目标检测

Abstract: Traffic cone barrels serve as important markers for defining the boundaries of drivable areas on roads, and precise and efficient detection of traffic cone barrels is of significant importance for the safe navigation of autonomous vehicles. This paper presents a voxel feature attention-based point cloud object detection algorithm for traffic cone barrels, named AttenPillar, aiming to address the issues of weak feature extraction capabilities and the inability to focus on key spatial information in existing methods, resulting in poor robustness and accuracy. PointPillars is used as the baseline model, and an encoder-decoder structure is adopted as the backbone network, while the ReLU activation function is replaced by the GELU (Gaussian Error Linear Unit) activation function. To better capture features, a PillarWise hybrid domain attention mechanism module is proposed, which can aggregate the features of points in voxels and use the hybrid domain attention mechanism to generate attention tensors. By performing hybrid domain attention operations between the input and output feature layers of the network backbone and the attention tensors, the spatial geometric information loss during the downsampling process can be reduced, and the weights of non-empty voxels in the output feature map can be increased. This allows the network to focus more on the non-empty voxel part, thereby fully extracting the point cloud features in voxels with different point cloud quantities. Finally, the detection head outputs the 3D prediction boxes. A traffic cone point cloud dataset is collected and constructed for algorithm verification, and the proposed algorithm achieves 10.90% and 14.41% improvements in BEV AP (IoU=0.7) and 3D AP (IoU=0.7) respectively compared to the baseline model PointPillars, with a speed of 78.86 FPS on the embedded computing device NVIDIA AGX Xavier, effectively improving the accuracy of cone detection while ensuring real-time performance.

Key words: traffic cone detection, 3D object detection, attention mechanism, small object detection

中图分类号:

U495
TP391

练秋酉, 郑少武, 涂新奎, 李巍华. 基于体素特征注意力的交通锥桶点云目标检测算法[J]. 机械工程学报, 2025, 61(16): 239-249.

LIAN Qiuyou, ZHENG Shaowu, TU Xinkui, LI Weihua. Voxel Feature Attention-based Point Cloud Object Detection Algorithm for Traffic Cone[J]. Journal of Mechanical Engineering, 2025, 61(16): 239-249.

参考文献

[1] QI C R, SU H, MO K, et al. Pointnet: Deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 652-660.
[2] QI C R, YI L, SU H, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space[J]. Advances in Neural Information Processing Systems, 2017, 30: 5099-5108.
[3] SHI S, WANG X, LI H. Pointrcnn: 3D object proposal generation and detection from point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 770-779.
[4] YANG Z, SUN Y, LIU S, et al. Std: Sparse-to-dense 3D object detector for point cloud[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1951-1960.
[5] 刘永刚,于丰宁,章新杰,等. 基于激光点云与图像融合的 3D 目标检测研究[J]. 机械工程学报, 2022, 58(24): 289-299. LIU Yonggang, YU Fengning, ZHANG Xinjie, et al. Research on 3D object detection based on laser point cloud and image fusion[J]. Journal of Mechanical Engineering, 2012, 58(24): 289-299.
[6] SHI W, RAJKUMAR R. Point-GNN: Graph neural network for 3D object detection in a point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1711-1719.
[7] MONTI F, BOSCAINI D, MASCI J, et al. Geometric deep learning on graphs and manifolds using mixture model cnns[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 5115-5124.
[8] ZHOU Y, TUZEL O. Voxelnet: End-to-end learning for point cloud based 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4490-4499.
[9] YAN Y, MAO Y, LI B. Second: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[10] LANG A H, VORA S, CAESAR H, et al. Pointpillars: Fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 12697-12705.
[11] YIN T, ZHOU X, KRAHENBUHL P. Center-based 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11784-11793.
[12] WANG Y, FATHI A, KUNDU A, et al. Pillar-based object detection for autonomous driving[C]//Computer Vision-ECCV 2020 : 16th European Conference , Glasgow, UK, August 23-28, 2020, Proceedings, Part XXⅡ 16. Springer International Publishing, 2020: 18-34.
[13] 郑少武,李巍华,陈泽涛,等. 一种基于多线激光雷达的赛道锥桶检测及目标点追踪方法: CN110780305A [P]. 2020-02-11. ZHENG Shaowu, LI Weihua, CHEN Zetao, et al. A method of track cone barrel detection and target point tracking based on multi-line laser radar: CN110780305A [P]. 2020-02-11.
[14] 黄瑞钦,梁洪波,李强,等. 基于改进欧氏聚类的锥桶检测方法与试验[J]. 应用激光, 2022, 42(10): 126-134. HUANG Ruiqin, LIANG Hongbo, LI Qiang, et al. Based on improved Euclidean cluster cone drum test method and test[J]. Applied Laser, 2022, 42(10) : 126-134.
[15] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The kitti dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.
[16] SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2446-2454.
[17] KATSAMENIS I, KAROLOU E E, DAVRADOU A, et al. TraCon: A novel dataset for real-time traffic cones detection using deep learning[C]//Novel & Intelligent Digital Systems Conferences. Cham: Springer International Publishing, 2022: 382-391.
[18] LI E, WANG S, LI C, et al. Sustech points: A portable 3d point cloud interactive annotation platform system[C]//2020 IEEE Intelligent Vehicles Symposium (IV). New York: IEEE, 2020: 1108-1115.
[19] QUIGLEY M, CONLEY K, GERKEY B, et al. ROS: An open-source robot operating system[C]//ICRA Workshop on Open Source Software. 2009, 3(3.2): 5.
[20] FISCHLER M A , BOLLES R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981, 24(6): 381-395.
[21] BENTLEY J L. Multidimensional binary search trees used for associative searching[J]. Communications of the ACM, 1975, 18(9): 509-517.
[22] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[23] IOFFE S , SZEGEDY C. Batch normalization : Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning. PMLR, 2015: 448-456.
[24] HAN J, MORAGA C. The influence of the sigmoid function parameters on the speed of backpropagation learning[C]//From Natural to Artificial Neural Computation: International Workshop on Artificial Neural Networks Malaga-Torremolinos, Spain, June 7-9, 1995 Proceedings 3. Springer Berlin Heidelberg , 1995 : 195-201.
[25] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE, 2017: 2980-2988.
[26] SHI S, WANG Z, SHI J, et al. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(8): 2647-2664.
[27] DENG J, SHI S, LI P, et al. Voxel R-CNN: Towards high performance voxel-based 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(2): 1201-1209.
[28] SHI S, GUO C, JIANG L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10529-10538.
[29] WU H, WEN C, LI W, et al. Transformation-equivariant 3D object detection for autonomous driving[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(3): 2795-2802.
[30] WU H, DENG J, WEN C, et al. CasA: A cascade attention network for 3D object detection from lidar point clouds[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-11.
[31] XIA Q, CHEN Y, CAI G, et al. 3D HANet: A flexible 3D heatmap auxiliary network for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-13.
[32] YANG H, HE T, LIU J, et al. GD-MAE: Generative decoder for mae pre-training on lidar point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 9403-9414.
[33] HU J , SHEN L , SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 7132-7141.
[34] WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19.
[35] ZHANG Q L, YANG Y B. Sa-net: Shuffle attention for deep convolutional neural networks[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York: IEEE, 2021: 2235-2239.
[36] HUANG Z, WANG X, HUANG L, et al. Ccnet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2019: 603-612.
[37] YANG L, ZHANG R Y, LI L, et al. Simam: A simple, parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning. PMLR, 2021: 11863-11874.
[38] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 13713-13722.
[39] WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11534-11542.