• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2023, Vol. 59 ›› Issue (8): 224-234.doi: 10.3901/JME.2023.08.224

• 运载工程 • 上一篇    下一篇

扫码分享

TD3算法改进与自动驾驶汽车并道策略学习

张志勇1,2, 黄大洋2, 黄彩霞1,3, 胡林2, 杜荣华2   

  1. 1. 长沙理工大学机械装备高性能智能制造关键技术湖南省重点实验室 长沙 410114;
    2. 长沙理工大学汽车与机械工程学院 长沙 410114;
    3. 湖南工程学院汽车动力与传动系统湖南省重点实验室 湘潭 411104
  • 收稿日期:2022-02-07 修回日期:2022-10-25 出版日期:2023-04-20 发布日期:2023-06-16
  • 通讯作者: 杜荣华,男,1973年出生,博士,教授。主要研究方向为智能汽车主动安全控制,智能交通与车路协同技术。E-mail:csdrh@163.com
  • 作者简介:张志勇,男, 1976 年出生,博士,副教授。主要研究方向为智能汽车主动安全控制,车辆动力学及控制。E-mail: zzy04@163.com
  • 基金资助:
    国家自然科学基金(61973047); 湖南省自然科学基金(2021JJ30182,2022JJ50020); 湖南省教育厅科学研究(20A018); 机械装备高性能智能制造关键技术湖南省重点实验室(长沙理工大学)开放基金(2020YB02)资助项目

TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles

ZHANG Zhi-yong1,2, HUANG Da-yang2, HUANG Cai-xia1,3, HU Lin2, DU Rong-hua2   

  1. 1. Hunan Province Key Laboratory of Intelligent Manufacturing Technology for High-performance Mechanical Equipment, Changsha University of Science and Technology, Changsha 410114;
    2. College of Automobile and Mechanical Engineering,Changsha University of Science and Technology, Changsha 410114;
    3. Hunan Provincial Key Laboratory of Automotive Power and Transmission System,Hunan Institute of Technology, Xiangtan 411104
  • Received:2022-02-07 Revised:2022-10-25 Online:2023-04-20 Published:2023-06-16

摘要: 为提高自动并道策略的综合性能,改进了双延迟深度确定性策略梯度算法(Twin delayed deep deterministic policy gradient,TD3)的Q值估计方法和奖励函数。通过马尔科夫决策过程,将车辆并道过程建模为强化学习问题,分析TD3强化学习算法中Q值低估对并道决策的影响。对TD3算法的双评论家目标网络执行蒙特卡洛随机失活,在获得两个Q值估计样本的基础上,提出基于样本方差加权平均的Q值估计方法,提高TD3算法的Q值估计精度。在优先保证完成并道任务的前提下,充分考虑车辆并道过程中的安全性、舒适性和交通效率,建立完备的奖励函数。基于改进的TD3算法和奖励函数,通过BARK模拟器开展自动驾驶汽车并道策略学习和测试。结果表明,提出的改进TD3算法显著提高了Q值估计精度。结合建立的奖励函数,在保证交通效率的同时提高了车辆并道的安全性和乘坐舒适性。

关键词: 自动驾驶汽车, 强化学习, 并道策略, Q值估计

Abstract: To enhance the comprehensive performance of automotive lane-merging, the Q-value estimation method of twin delayed deep deterministic policy gradient(TD3) algorithm and the reward function are improved. The automotive lane-merging model is formalized as the Markov decision process, and the influences of Q-value underestimated by TD3 algorithm on lane-merging strategy are analyzed. A Q-value estimation method based on weighted average of sample variance is proposed to enhance the Q-value estimation accuracy, when two Q-value estimation samples are obtained by performing Monte Carlo dropout on the dual target critic network. With giving priority to the completion of the lane-merging, a more perfect reward function is designed considering the safety,comfort and traffic efficiency. Based on the improved TD3 algorithm and the reward function, a lane-merging strategy of autonomous vehicles is learned and verified with BARK simulator. The results show that the improved TD3 algorithm significantly enhances the accuracy of Q-value estimation. Combined with the established reward function, the safety and ride comfort of lane-merging are improved while ensuring traffic efficiency.

Key words: autonomous vehicle, reinforcement learning, lane-merging strategy, Q-value estimation

中图分类号: