• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2025, Vol. 61 ›› Issue (10): 276-287.doi: 10.3901/JME.2025.10.276

• 运载工程 • 上一篇    

扫码分享

多模态多序列高效越野可行驶区域检测

李路兴1, 魏超1,2, 胡乐云1, 随淑鑫1, 徐扬1, 钱歆昊1   

  1. 1. 北京理工大学机械与车辆学院 北京 100081;
    2. 特种车辆设计制造集成技术全国重点实验室 北京 100081
  • 收稿日期:2024-05-12 修回日期:2024-12-08 发布日期:2025-07-12
  • 作者简介:李路兴,男,1993年出生,博士研究生。主要研究方向为无人车多传感器融合感知。E-mail:3120215184@bit.edu.cn;魏超(通信作者),男,1980年出生,博士,教授,博士研究生导师。主要研究方向为无人驾驶车辆技术。E-mail:weichaobit@163.com
  • 基金资助:
    国家自然科学基金资助项目(52002026)。

Multimodal-time-series System for Off-road Freespace Efficient-detection

Li Luxing1, Wei Chao1,2, Hu Leyun1, Sui Shuxin1, Xu Yang1, Qian Xinhao1   

  1. 1. School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081;
    2. National Key Laboratory of Special Vehicle Design and Manufacturing Integration Technology, Beijing 100081
  • Received:2024-05-12 Revised:2024-12-08 Published:2025-07-12

摘要: 可行驶区域检测对自动驾驶轨迹预测和路径规划起着重要支撑作用,但越野环境的复杂性和边界的不规则性限制其检测精度、实时性和泛化性的提高。针对以上挑战,提出一种多模态多序列高效可行驶区域检测网络。该网络融合了Transformer和CNN的优势,通过自适应空间和时间门控单元实现RGB图像与激光雷达数据的多模态多序列信息的融合与增强,并利用M-IDA模块细化处理输出特征,进一步提高检测精度与泛化性。此外,采用线性Transformer编码和深度可分离卷积降低计算复杂度,实现高效推理。基于ORFD数据集的四组对比试验结果表明,相较于基线模型,检测精确、F1得分和交并比分别提高了2.3%、1.2%和2%,推理时间降低了40.8%。此外,消融试验验证了各部件的有效性,ORFD测试集、KITTI Road数据集和实车实测数据进一步验证了该网络的准确性、高效性和泛化性。

关键词: 深度学习, 越野环境, 可行驶检测, 多模态, 多序列

Abstract: Freespace detection plays a crucial role in autonomous trajectory prediction and path planning; however, the complexity of off-road environments and the irregularity of navigable boundaries limit improvements in detection accuracy, real-time performance, and generalization. To address these challenges, a multimodal multi-sequence efficient freespace detection network is proposed. This network leverages the strengths of Transformer and CNN, integrating a spatiotemporal adaptive gating unit to effectively fuse and enhance multimodal multi-sequence information from RGB images and LiDAR data. Additionally, an M-IDA module is incorporated to refine output features, further improving detection accuracy and generalization. Moreover, linear Transformer encoding and depthwise separable convolutions are employed to reduce computational complexity and achieve efficient inference. Four comparative experiments on the ORFD dataset demonstrate that, compared to baseline models, the proposed network achieves 2.3%, 1.2%, and 2% improvements in accuracy, F1-score, and IoU, respectively, while reducing inference time by 40.8%. Furthermore, ablation studies validate the effectiveness of each module, and additional evaluations on the ORFD test set, KITTI road dataset, and real-vehicle experiments further confirm the network’s generalization accuracy, efficiency and capability.

Key words: deep learning, off-road, freespace detection, multi-modal, multi-sequence

中图分类号: