• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2024, Vol. 60 ›› Issue (10): 329-338.doi: 10.3901/JME.2024.10.329

• 智能决策规划 • 上一篇    下一篇

扫码分享

基于时序差分学习模型预测控制的一体化自动驾驶换道策略

杨硕1, 李时珍1, 赵中原2, 黄小鹏3, 黄岩军1   

  1. 1. 同济大学汽车学院 上海 201804;
    2. 南京信息工程大学自动化学院 南京 210044;
    3. 中电科东方通信集团有限公司 广东 519060
  • 收稿日期:2023-08-15 修回日期:2023-12-26 出版日期:2024-05-20 发布日期:2024-07-24
  • 作者简介:杨硕,男,1995年出生,博士研究生。主要研究方向为自动驾驶汽车,强化学习,智能交通系统和车辆动力学。
    E-mail:yangshuo_jlu@163.com
    黄岩军(通信作者),男,1986年出生,博士,教授,博士研究生导师。主要研究方向为自动驾驶和人工智能相结合的决策规划、运动控制、人机合作驾驶。
    E-mail:yanjun_huang@tongji.edu.cn
  • 基金资助:
    国家自然科学基金-企业创新发展联合基金资助项目(U23B2061)。

Integrated Autonomous Driving Lane Change Policy Based on Temporal Difference Learning Model Predictive Control

YANG Shuo1, LI Shizhen1, ZHAO Zhongyuan2, HUANG Xiaopeng3, HUANG Yanjun1   

  1. 1. School of Automotive Studies, Tongji University, Shanghai 201804;
    2. College of Automation, Nanjing University of Information Science and Technology, Nanjing 210044;
    3. China Electronics Technology Eastern Communication Group Co., Ltd., Guangdong 519060
  • Received:2023-08-15 Revised:2023-12-26 Online:2024-05-20 Published:2024-07-24

摘要: 具有自进化能力的自动驾驶换道策略有望在复杂开放的交通环境中提升性能,以应对更多的未知场景。时序差分学习模型预测控制(Temporal difference learning for model predictive control, TD-MPC)结合有模型和无模型强化学习方法的优势,具有学习效率高、性能优异的特点。基于此,为了提高自动驾驶换道策略的整体性能,提出基于TD-MPC的自动驾驶一体化换道策略。具体来说,针对自动换道问题,提出基于驾驶倾向网络的一体化自动驾驶换道策略架构,构建强化学习问题并设计完备的奖励函数,对决策规划优化问题进行统一求解。应用TD-MPC算法设计内部模型来预测未来状态和奖励,实现短时域内的局部轨迹优化,同时使用时序差分学习实现对长期汇报的估计,以得到驾驶倾向网络参数。所提出方法在高保真仿真环境中被验证,结果表明,所提出方法相比规则方案保证行驶效率,并且提高安全性和舒适性。同时与软演员-评论家算法(Soft actor critic, SAC)相比,实现了7~9倍的学习效率提升。

关键词: 自动驾驶, 强化学习, 一体化决策规划

Abstract: Autonomous vehicles are expected to achieve self-evolution in the real-world environment to gradually cover more complex and changing scenarios. Temporal difference learning for model predictive control(TD-MPC) combines the advantages of model-free and model-free reinforcement learning methods, and has the characteristics of high learning efficiency and excellent performance. Based on this, in order to improve the overall performance of the automated lane change policy, an integrated automated lane change method based on TD-MPC is proposed. Specifically, an integrated architecture based on driving propensity network is proposed. The reinforcement learning problem is constructed and a complete reward function is designed to solve the decision planning optimization problem in a unified way. The TD-MPC algorithm is used to design an internal model to predict the future state and reward, so as to realize the local trajectory optimization in the short time domain. At the same time, the temporal difference learning is used to estimate the long-term report to obtain the parameters of the driving tendency network. The proposed method is verified in a high-fidelity simulation environment. The results show that compared with the regular scheme, the proposed method ensures driving efficiency, and improves safety and comfort. At the same time, compared with the soft actor critic(SAC) algorithm, the learning efficiency is improved by 7 to 9 times.

Key words: autonomous driving, reinforcement learning, integrated decision making and planning

中图分类号: