• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2025, Vol. 61 ›› Issue (16): 239-249.doi: 10.3901/JME.2025.16.239

• 运载工程 • 上一篇    

扫码分享

基于体素特征注意力的交通锥桶点云目标检测算法

练秋酉1, 郑少武1, 涂新奎1, 李巍华1,2   

  1. 1. 华南理工大学机械与汽车工程学院 广州 510641;
    2. 人工智能与数字经济广东省实验室 广州 510335
  • 接受日期:2024-08-25 出版日期:2025-03-11 发布日期:2025-03-11
  • 作者简介:练秋酉,男,1998年出生。主要研究方向为三维目标检测。E-mail:202120101128@mail.scut.edu.cn;郑少武,男,1996年出生,博士研究生。主要研究方向为智能驾驶环境感知、多传感器融合。E-mail:mezhengsw@mail.scut.edu.cn;涂新奎,男,1998年出生,硕士研究生。主要研究方向为点云三维目标检测。E-mail:763149484@qq.com;李巍华(通信作者),男,1973年出生,博士,教授,博士研究生导师。主要研究方向为智能驾驶、工业智能、工业大数据、装备智能运维。E-mail:whlee@scut.edu.cn
  • 基金资助:
    “广州市重点领域研发计划”资助项目(202206030005)

Voxel Feature Attention-based Point Cloud Object Detection Algorithm for Traffic Cone

LIAN Qiuyou1, ZHENG Shaowu1, TU Xinkui1, LI Weihua1,2   

  1. 1. School of Mechanical & Automotive Engineering, South China University of Technology, Guangzhou 510641;
    2. Guangdong Artificial Intelligent and Digital Economg Laboratory, Guangzhou 510335
  • Accepted:2024-08-25 Online:2025-03-11 Published:2025-03-11

摘要: 交通锥桶是道路可行驶区域边界的重要标识之一,精确高效的交通锥桶检测对自动驾驶汽车安全行驶具有重要意义。针对现有交通锥桶点云目标检测方法存在特征提取能力弱,无法关注到空间中的关键信息而导致鲁棒性和精度较差的问题,提出一种基于体素特征注意力的交通锥桶点云目标检测算法,称为AttenPillar。以PointPillars作为基线模型,并采用编码器-解码器结构作为主干网络,同时将ReLU激活函数替换为GELU(Gaussian error linear unit)激活函数。为了更好地捕捉特征,提出混合域注意力机制模块PillarWise,它能够聚合体素中点的特征,并利用混合域注意力机制生成注意力张量。通过在网络主干的输入和输出特征层与注意力张量之间进行混合域注意力操作,能够减少下采样过程中的空间几何信息损失,并增大输出特征图中非空体素的权重。这使得网络能够更关注非空体素部分,从而充分提取不同点云数量的体素中的点云特征。最后通过检测头输出三维预测框。采集并构建交通锥桶点云数据集进行算法验证,所提算法相较基线模型PointPillars在BEV AP(IoU=0.7)、3D AP(IoU=0.7)分别提升10.90%、14.41%,在嵌入式计算设备NVIDIA AGX Xavier上速度可达78.86帧/s,在保证实时性的前提下,有效提高了锥桶检测精度。

关键词: 交通锥桶检测, 3D目标检测, 注意力机制, 小目标检测

Abstract: Traffic cone barrels serve as important markers for defining the boundaries of drivable areas on roads, and precise and efficient detection of traffic cone barrels is of significant importance for the safe navigation of autonomous vehicles. This paper presents a voxel feature attention-based point cloud object detection algorithm for traffic cone barrels, named AttenPillar, aiming to address the issues of weak feature extraction capabilities and the inability to focus on key spatial information in existing methods, resulting in poor robustness and accuracy. PointPillars is used as the baseline model, and an encoder-decoder structure is adopted as the backbone network, while the ReLU activation function is replaced by the GELU (Gaussian Error Linear Unit) activation function. To better capture features, a PillarWise hybrid domain attention mechanism module is proposed, which can aggregate the features of points in voxels and use the hybrid domain attention mechanism to generate attention tensors. By performing hybrid domain attention operations between the input and output feature layers of the network backbone and the attention tensors, the spatial geometric information loss during the downsampling process can be reduced, and the weights of non-empty voxels in the output feature map can be increased. This allows the network to focus more on the non-empty voxel part, thereby fully extracting the point cloud features in voxels with different point cloud quantities. Finally, the detection head outputs the 3D prediction boxes. A traffic cone point cloud dataset is collected and constructed for algorithm verification, and the proposed algorithm achieves 10.90% and 14.41% improvements in BEV AP (IoU=0.7) and 3D AP (IoU=0.7) respectively compared to the baseline model PointPillars, with a speed of 78.86 FPS on the embedded computing device NVIDIA AGX Xavier, effectively improving the accuracy of cone detection while ensuring real-time performance.

Key words: traffic cone detection, 3D object detection, attention mechanism, small object detection

中图分类号: