多级参数融合网络的驾驶场景目标检测方法研究

doi:10.3901/JME.2024.10.064

机械工程学报 ›› 2024, Vol. 60 ›› Issue (10): 64-75.doi: 10.3901/JME.2024.10.064

扫码分享

多级参数融合网络的驾驶场景目标检测方法研究

林晨¹, 何智成^1,2, 黄怡菲³, 林智桂², 付广², 黄晋⁴

1. 湖南大学整车先进设计制造技术全国重点实验室长沙 410082;
2. 上汽通用五菱汽车股份有限公司技术中心柳州 545007;
3. 澳门大学机电工程系澳门 999078;
4. 清华大学车辆与运载学院北京 100084

收稿日期:2023-12-05 修回日期:2024-02-05 出版日期:2024-05-20 发布日期:2024-07-24
作者简介:林晨,男,1995年出生,博士研究生。主要研究方向为智能汽车环境感知技术。
E-mail:linchen132@hnu.edu.cn
何智成(通信作者),男,1983年出生,博士,教授,博士研究生导师。主要研究方向为智能汽车与智能控制,先进结构与智能设计。
E-mail:hezhicheng815@163.com
基金资助:
国家自然科学基金联合(U20A20285)、广西科技重大专项(2021AA04004)和柳州市科技重大专项(Y2021AA0101A033)资助项目。

Research on Detection Method for Driving Scenarios Based on Multi-stage Parameter Fusion Network

LIN Chen¹, HE Zhicheng^1,2, HUANG Yifei³, LIN Zhigui², FU Guang², HUANG Jin⁴

1. State Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle, Hunan University, Changsha 410082;
2. SGMW Automobile Co., Ltd., Liuzhou 545007;
3. Department of Electromechanical Engineering, University of Macau, Macao 999078;
4. School of Vehicle and Mobility, Tsinghua University, Beijing 100084

Received:2023-12-05 Revised:2024-02-05 Online:2024-05-20 Published:2024-07-24

摘要/Abstract

摘要： 基于深度学习的目标检测方法在智能车载控制器应用时很难同时满足检测精度与速度的要求。因此，提出一种多级参数融合的驾驶场景目标检测方法，实现检测速度和精度的同步提升。首先，设计出一种多级分支结构用于构建模型，同时，为提高模型的推理速度，引入一种多级参数融合的方法，即将多级结构层等效为单一的卷积-批标准化层，在保证模型泛化能力不变的条件下，大幅度减小模型的参数量。其次，为增加模型的检测精度，提出一种SSIoU(Soft scaled intersection of union)边界框损失计算方法以及一种联合半锚框的标签关联算法，提高模型对驾驶场景的适应能力。最后，开展基于DAIR-V2X-V数据集的试验验证，结果表明，所提出的多级参数融合模型，相比于目前先进的YOLO(You only look once)算法，检测精度(Mean average precision, mAP)提高了9.89%，推理速度(Frames per second, FPS)提高了51.89%。

关键词: 智能汽车, 目标检测, 参数融合, SSIoU, YOLO算法

Abstract: It is difficult to meet the requirements of both accuracy and speed when applied to intelligent vehicle controllers for object detection based on deep learning methods. Therefore, a multi-stage parameter fusion object detection method for driving scenarios has been proposed, achieving an improvement for detection speed and accuracy simultaneously. Firstly, a multi-stage branching structure is designed to build the model, at the same time, to improve the speed of model inference, the multi-stage branching structure is equivalent to a single convolution-batch normalization layer by introducing a parameter fusion method, whose parameters are reduced greatly with unchanged generalization. Secondly, to improve detection accuracy, a bounding box loss function of SSIoU(Soft scaled intersection of union) and a united semi-anchor free labeling assignment are put forward, enhancing model adaptability to driving scenarios. Finally, the experiments are conducted on the DAIR-V2X-V dataset, the results show that the approach proposed achieves 9.89% and 51.89% improvements in mAP(mean average precision) and FPS(Frames per second) compared to the state-of-the-art YOLO (You only look once) algorithm.

Key words: intelligent vehicle, object detection, parameter fusion, SSIoU, YOLO algorithm

中图分类号:

TP183

林晨, 何智成, 黄怡菲, 林智桂, 付广, 黄晋. 多级参数融合网络的驾驶场景目标检测方法研究[J]. 机械工程学报, 2024, 60(10): 64-75.

LIN Chen, HE Zhicheng, HUANG Yifei, LIN Zhigui, FU Guang, HUANG Jin. Research on Detection Method for Driving Scenarios Based on Multi-stage Parameter Fusion Network[J]. Journal of Mechanical Engineering, 2024, 60(10): 64-75.

参考文献

[1] ZOU Zhengxia，CHEN Keyan，SHI Zhenwei，et al. Object detection in 20 years:A survey[J]. Proceedings of the IEEE，2023，111(3):257-276.
[2] REDMON J，DIVVALA S，GIRSHICK R，et al. You only look once:Unified，real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:IEEE，2016:779-788.
[3] REDMON J，FARHADI A. YOLO9000:Better，faster，stronger[C]//30TH IEEE Conference on Computer Vision and Pattern Recognition. Honolulu:IEEE，2017:6517-6525.
[4] REDMON J，FARHADI A. YOLOv3:An incremental improvement[EB/OL]. [2023-04-05]. https://arxiv.org/pdf/1804.02767.pdf.
[5] 薛培林，吴愿，殷国栋，等. 基于信息融合的城市自主车辆实时目标识别[J]. 机械工程学报，2020，56(12):165-173. XUE Peilin，WU Yuan，YIN Guodong，et al. Real-time target recognition for urban autonomous vehicles based on information fusion[J]. Journal of Mechanical Engineering，2020，56(12):165-173.
[6] LIN T Y，GOYAL P，GIRSHICK R，et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence，2017，42(2):318-327.
[7] 程腾，孙磊，侯登超，等. 基于特征融合的多层次多模态目标检测[J]. 汽车工程，2021，43(11):1602-1610. CHENG Teng，SUN Lei，HOU Dengchao，et al. Multi-level and multi-modal target detection based on feature fusion[J]. Automotive Engineering，2021，43(11):1602-1610.
[8] 范佳琦，李鑫，霍天娇，等. 基于单阶段算法的智能汽车跨域检测研究[J]. 中国公路学报，2022，35(3):249-262. FAN Jiaqi，LI Xin，HUO Tianjiao，et al. Research on cross domain detection in intelligent vehicles based on one-stage algorithm[J]. China Journal of Highway and Transport，2022，35(3):249-262.
[9] BOCHKOVSKIY A，WANG Chienyao，LIAO Hongyuan. YOLOv4:Optimal speed and accuracy of object detection[EB/OL]. [2023-04-05]. https://arxiv.org/pdf/2004.10934.pdf.
[10] Ultralytics. YOLOv5[CP/OL]. (2020-06-10)[2023-04-16]. https://github.com/ultralytics/yolov5.
[11] GE Zheng，LIU Songtao，WANG Feng，et al. YOLOX:Exceeding YOLO series in 2021[EB/OL]. [2023-04-05]. https://arxiv.org/pdf/2107.08430.pdf.
[12] LI Chuyi，LI Luli，JIANG Hongliang，et al. YOLOv6:A single-stage object detection framework for industrial applications[EB/OL]. [2023-04-05]. https://arxiv.org/pdf/2209.02976.pdf.
[13] HE Kaiming，ZHANG Xiaoyu，REN Shaoqing，et al. Deep residual learning for image recognition[C]//29 TH IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:IEEE，2016:770-778.
[14] DING Xiaohan，ZHANG Xiangyu，MA Ningging，et al. RepVGG:Making VGG-style ConvNets great again[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Kuala Lumpur:IEEE，2021:13728-13737.
[15] YU Jiahui，JIANG Yuning，WANG Zhangyang，et al. UnitBox:An advanced object detection network[C]//Proceedings of the 24th ACM International Conference. Multimedia:ACM，2016:516-520.
[16] REZATOFIGHI H，TSOI N，GWAK J Y，et al. Generalized intersection over union:A metric and a loss for bounding box regression[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach:IEEE，2019:658-666.
[17] ZHENG Zhaohui，WANG Ping，LIU Wei，et al. Distance-IoU loss:Faster and better learning for bounding box regression[C]//The Thirty-fourth AAAI Conference on Artificial Intelligence. New York:AAAI，2020:12993-13000.
[18] ZHORA G. SIoU loss:More powerful learning for bounding box regression[EB/OL]. [2023-04-05]. https://arxiv.org/pdf/2205.12740.pdf.
[19] REN Shaoqing，HE Kaiming，GIRSHICK R，et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence，2017，39(6):1137-1149.
[20] LIU Wei，Dragomir A，Dumitru E，et al. SSD:Single shot multibox detector[C]//2016 European Conference on Computer Vision. Amsterdam:ECCV，2016:21-37.
[21] GE Zheng，LIU Songtao，LI Zeming，et al. OTA:Optimal transport assignment for object detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Kuala Lumpur:IEEE，2021:303-312.
[22] TIAN Zhi，SHEN Chunhua，CHEN Ha，et al. FCOS:Fully convolutional one-stage object detection[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul:ICCV，2019:9626-9635.
[23] WANG Chienyao，LIAO Hongyuan，WU Yuehhua，et al. CSPNET:A new backbone that can enhance learning capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle:IEEE，2020:1571-1580.
[24] HE Kaiming，ZHANG Xiaoyu，REN Shaoqing，et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence，2014，37(9):1904-1916.
[25] LIN Tsungyi，DOLLAR P，GIRSHICK R，et al. Feature pyramid networks for object detection[C]//2017 Conference on Computer Vision and Pattern Recognition Workshops. Honolulu:IEEE，2017:936-944.
[26] LIU Shu，QI Lu，QIN Haifang，et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake:IEEE，2018:8759-8768.
[27] IOFFE S，SZEGEDY C. Batch normalization:Accelerating deep network training by reducing internal covariate shift[EB/OL]. [2023-04-05]. https://arxiv.org/pdf/1502.03167.pdf.
[28] YU Haibao，LUO Yizhen，SUN Mao，et al. DAIR-V2X:A large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans:IEEE，2022:21329-21338.
[29] BODLA N，SINGH B，CHELLAPPA R，et al. Soft-NMS-Improving object detection with one line of code[C]//IEEE International Conference on Computer Vision. Venice:IEEE，2017:5561-5569.

多级参数融合网络的驾驶场景目标检测方法研究

Research on Detection Method for Driving Scenarios Based on Multi-stage Parameter Fusion Network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	曾迪, 郑玲, 李以农, 杨显通. 自动驾驶奖励函数贝叶斯逆强化学习方法[J]. 机械工程学报, 2024, 60(10): 245-260.
[2]	毛杨坤, 段现银, 林昕, 傅盈西, 朱锟鹏. 基于目标检测的选区激光熔融成形过程熔池与飞溅监测[J]. 机械工程学报, 2023, 59(9): 335-348.
[3]	王永胜, 刘金鑫, 卜德旭, 江发潮, 罗禹贡. 智能汽车显式沟通下的交互式行人穿行行为预测[J]. 机械工程学报, 2023, 59(8): 151-162.
[4]	彭湃, 耿可可, 王子威, 柳智超, 殷国栋. 智能汽车环境感知方法综述[J]. 机械工程学报, 2023, 59(20): 281-303.
[5]	臧勇, 蔡英凤, 孙晓强, 徐兴, 陈龙, 王海. 基于可拓博弈的智能汽车轨迹跟踪协调控制方法研究[J]. 机械工程学报, 2022, 58(8): 181-194.
[6]	刘永刚, 于丰宁, 章新杰, 陈峥, 秦大同. 基于激光点云与图像融合的3D目标检测研究[J]. 机械工程学报, 2022, 58(24): 289-299.
[7]	韩嘉懿, 赵健, 朱冰. 面向智能汽车人机协同转向控制的强化学习变阻抗人机交互方法[J]. 机械工程学报, 2022, 58(18): 141-149.
[8]	杨俊儒, 褚端峰, 陆丽萍, 王金湘, 吴超仲, 殷国栋. 智能汽车人机共享控制研究综述[J]. 机械工程学报, 2022, 58(18): 31-55.
[9]	张利鹏, 苏泰, 严勇. 基于采样区域优化的智能车辆轨迹规划方法[J]. 机械工程学报, 2022, 58(14): 276-287.
[10]	梁艺潇, 李以农, KHAJEPOUR Amir, 郑玲. 基于转向与主动横摆力矩协调的四轮驱动智能电动汽车路径跟踪控制[J]. 机械工程学报, 2021, 57(6): 142-155.
[11]	赵子婧, 刘宏哲, 曹东璞. 基于Libra R-CNN改进的交通标志检测算法[J]. 机械工程学报, 2021, 57(22): 255-265.
[12]	张志达, 郑玲, 李以农, 吴行, 余颖弘. 基于鲁棒自适应SCKF的智能汽车目标状态跟踪研究[J]. 机械工程学报, 2021, 57(20): 181-193.
[13]	张雷, 王子浩, 孙逢春, 王震坡. 四轮轮毂电机驱动智能电动汽车转向失效容错控制研究[J]. 机械工程学报, 2021, 57(20): 141-152.
[14]	王明强, 王震坡, 张雷. 基于碰撞风险评估的智能汽车局部路径规划方法研究[J]. 机械工程学报, 2021, 57(10): 28-41.
[15]	谢有浩, 魏振亚, 赵林峰, 王家恩, 陈无畏. 基于μ综合方法的智能车辆人机共驾的鲁棒横向控制[J]. 机械工程学报, 2020, 56(4): 104-114.