TD3算法改进与自动驾驶汽车并道策略学习

doi:10.3901/JME.2023.08.224

机械工程学报 ›› 2023, Vol. 59 ›› Issue (8): 224-234.doi: 10.3901/JME.2023.08.224

扫码分享

TD3算法改进与自动驾驶汽车并道策略学习

张志勇^1,2, 黄大洋², 黄彩霞^1,3, 胡林², 杜荣华²

1. 长沙理工大学机械装备高性能智能制造关键技术湖南省重点实验室长沙 410114;
2. 长沙理工大学汽车与机械工程学院长沙 410114;
3. 湖南工程学院汽车动力与传动系统湖南省重点实验室湘潭 411104

收稿日期:2022-02-07 修回日期:2022-10-25 出版日期:2023-04-20 发布日期:2023-06-16
通讯作者: 杜荣华,男,1973年出生,博士,教授。主要研究方向为智能汽车主动安全控制,智能交通与车路协同技术。E-mail:csdrh@163.com
作者简介:张志勇，男， 1976 年出生，博士，副教授。主要研究方向为智能汽车主动安全控制，车辆动力学及控制。E-mail： zzy04@163.com
基金资助:
国家自然科学基金(61973047); 湖南省自然科学基金(2021JJ30182,2022JJ50020); 湖南省教育厅科学研究(20A018); 机械装备高性能智能制造关键技术湖南省重点实验室(长沙理工大学)开放基金(2020YB02)资助项目

TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles

ZHANG Zhi-yong^1,2, HUANG Da-yang², HUANG Cai-xia^1,3, HU Lin², DU Rong-hua²

1. Hunan Province Key Laboratory of Intelligent Manufacturing Technology for High-performance Mechanical Equipment, Changsha University of Science and Technology, Changsha 410114;
2. College of Automobile and Mechanical Engineering,Changsha University of Science and Technology, Changsha 410114;
3. Hunan Provincial Key Laboratory of Automotive Power and Transmission System,Hunan Institute of Technology, Xiangtan 411104

Received:2022-02-07 Revised:2022-10-25 Online:2023-04-20 Published:2023-06-16

摘要/Abstract

摘要： 为提高自动并道策略的综合性能，改进了双延迟深度确定性策略梯度算法（Twin delayed deep deterministic policy gradient,TD3）的Q值估计方法和奖励函数。通过马尔科夫决策过程，将车辆并道过程建模为强化学习问题，分析TD3强化学习算法中Q值低估对并道决策的影响。对TD3算法的双评论家目标网络执行蒙特卡洛随机失活，在获得两个Q值估计样本的基础上，提出基于样本方差加权平均的Q值估计方法，提高TD3算法的Q值估计精度。在优先保证完成并道任务的前提下，充分考虑车辆并道过程中的安全性、舒适性和交通效率，建立完备的奖励函数。基于改进的TD3算法和奖励函数，通过BARK模拟器开展自动驾驶汽车并道策略学习和测试。结果表明，提出的改进TD3算法显著提高了Q值估计精度。结合建立的奖励函数，在保证交通效率的同时提高了车辆并道的安全性和乘坐舒适性。

关键词: 自动驾驶汽车, 强化学习, 并道策略, Q值估计

Abstract: To enhance the comprehensive performance of automotive lane-merging, the Q-value estimation method of twin delayed deep deterministic policy gradient（TD3） algorithm and the reward function are improved. The automotive lane-merging model is formalized as the Markov decision process, and the influences of Q-value underestimated by TD3 algorithm on lane-merging strategy are analyzed. A Q-value estimation method based on weighted average of sample variance is proposed to enhance the Q-value estimation accuracy, when two Q-value estimation samples are obtained by performing Monte Carlo dropout on the dual target critic network. With giving priority to the completion of the lane-merging, a more perfect reward function is designed considering the safety,comfort and traffic efficiency. Based on the improved TD3 algorithm and the reward function, a lane-merging strategy of autonomous vehicles is learned and verified with BARK simulator. The results show that the improved TD3 algorithm significantly enhances the accuracy of Q-value estimation. Combined with the established reward function, the safety and ride comfort of lane-merging are improved while ensuring traffic efficiency.

Key words: autonomous vehicle, reinforcement learning, lane-merging strategy, Q-value estimation

中图分类号:

U461

张志勇, 黄大洋, 黄彩霞, 胡林, 杜荣华. TD3算法改进与自动驾驶汽车并道策略学习[J]. 机械工程学报, 2023, 59(8): 224-234.

ZHANG Zhi-yong, HUANG Da-yang, HUANG Cai-xia, HU Lin, DU Rong-hua. TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles[J]. Journal of Mechanical Engineering, 2023, 59(8): 224-234.

参考文献

[1] 唐小林,陈佳信,刘腾,等. 基于深度强化学习的混合动力汽车智能跟车控制与能量管理策略研究[J]. 机械工程学报,2021,57(22):237-246. TANG Xiaolin,CHEN Jiaxin,LIU Teng,et al. Research on deep reinforcement learning-based intelligent car-following control and energy management strategy for hybrid electric vehicles[J]. Journal of Mechanical Engineering,2021,57(22):237-246.
[2] GONZÁLEZ D,PÉREZ J,MILANÉS V,et al. A review of motion planning techniques for automated vehicles[J]. IEEE Transactions on Intelligent Transportation Systems,2015,17(4):1135-1145.
[3] KESTING A,TREIBER M,HELBING D. General lane-changing model MOBIL for car-following models[J]. Transportation Research Record,2007,1999(1):86-94.
[4] KURT A,ÖZGÜNER Ü. Hierarchical finite state machines for autonomous mobile systems[J]. Control Engineering Practice,2013,21(2):184-194.
[5] 熊璐,杨兴,卓桂荣,等. 无人驾驶车辆的运动控制发展现状综述[J]. 机械工程学报,2020,56(10):127-143. XIONG Lu,YANG Xing,ZHUO Guirong,et al. Review on motion control of autonomous vehicles[J]. Journal of Mechanical Engineering,2020,56(10):127-143.
[6] URMSON C,ANHALT J,BAGNELL D,et al. Autonomous driving in urban environments:Boss and the urban challenge[J]. Journal of Field Robotics,2008,25(8):425-466.
[7] SAMAK T V,SAMAK C V,KANDHASAMY S. Robust behavioral cloning for autonomous vehicles using end-to-end imitation learning[J]. SAE International Journal of Connected and Automated Vehicles,2021,4(3):279-295.
[8] PAN Y,CHENG C A,SAIGOL K,et al. Imitation learning for agile autonomous driving[J]. The International Journal of Robotics Research,2020,39(2-3):286-302.
[9] KEBRIA P M,KHOSRAVI A,SALAKEN S M,et al. Deep imitation learning for autonomous vehicles based on convolutional neural networks[J]. IEEE/CAA Journal of Automatica Sinica,2019,7(1):82-95.
[10] CAI P,WANG S,SUN Y,et al. Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion[J]. IEEE Robotics and Automation Letters,2020,5(3):4218-4224.
[11] 刘照麟,陈吉清,兰凤崇,等. 基于轨迹张量的自动驾驶复合信息综合映射方法[J]. 机械工程学报,2020,56(16):214-226. LIU Zhaolin,CHEN Jiqing,LAN Fengchong,et al. Methodology on comprehensive mapping of multi-information of autonomous driving based on trajectory tensor[J]. Journal of Mechanical Engineering,2020,56(16):214-226.
[12] 乔良,鲍泓,玄祖兴,等. 基于强化学习的无人驾驶匝道汇入模型[J]. 计算机工程,2018,44(7):20-24. QIAO Liang,BAO Hong,XUAN Zuxing,et al. Autonomous driving ramp merging model based on reinforcement learning[J]. Computer Engineering,2018,44(7):20-24.
[13] HU H,LU Z,WANG Q,et al. End-to-End automated lane-change maneuvering considering driving style using a deep deterministic policy gradient algorithm[J]. Sensors,2020,20(18):5443.
[14] 王忠立,王浩,申艳,等. 一种多感知多约束奖励机制的驾驶策略学习方法[J/OL]. 吉林大学学报(工学版):1-11[2022-06-03].DOI:10.13229/j.cnki.jdxbgxb20210412. WANG Lizhong,WANG Hao,SHEN Yan,et al. A driving decision-making approach based on multi-sensing and multi-constraints reward function[J]. Journal of Jilin University(Engineering and Technology Edition):1-11[2022-06-03]. DOI:10.13229/j.cnki.jdxbgxb20210412.
[15] FUJIMOTO S,HOOF H,MEGER D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm:PMLR,2018:1587-1596.
[16] LV P,WANG X,CHENG Y,et al. Stochastic double deep q-network[J]. IEEE Access,2019,7:79446-79454.
[17] VAN HASSELT H,GUEZ A,SILVER D. Deep reinforcement learning with double q-learning[C]//Proceedings of the 30th AAAI Conference On Artificial Intelligence. Phoenix:AAAI,2016:2094-2100.
[18] WANG B,LI X,GAO Z,et al. Risk aversion operator for addressing maximization bias in Q-learning[J]. IEEE Access,2020,8:43098-43110.
[19] HAN S,ZHOU W B,LU S,et al. Regularly updated deterministic policy gradient algorithm[J]. Knowledge-Based Systems,2021,214:106736.
[20] SAGLAM B,DURAN E,CICEK D C,et al. Parameter-free deterministic reduction of the estimation bias in continuous control[J]. arXiv preprint arXiv,2021:11788.
[21] SAGLAM B,DURAN E,CICEK D C,et al. Estimation error correction in deep reinforcement learning for deterministic actor-critic methods[C]//The 33rd International Conference on Tools with Artificial Intelligence. Washington DC:IEEE,2021:137-144.
[22] 裴晓飞,莫烁杰,陈祯福,等. 基于TD3算法的人机混驾交通环境自动驾驶汽车换道研究[J]. 中国公路学报,2021,34(11):246-254. PEI Xiaofei,MO Shuojie,CHEN Zhenfu,et al. Lane changing of autonomous vehicle based on TD3 algorithm in human-machine hybrid driving environment[J]. China Journal of Highway and Transport,2021,34(11):246-254.
[23] REN T,XIE Y,JIANG L. Cooperative highway work zone merge control based on reinforcement learning in a connected and automated environment[J]. Transportation Research Record,2020,2674(10):363-374.
[24] SHUN Y,JIAN W,SUMIN Z,et al. Autonomous driving in the uncertain traffic-A deep reinforcement learning approach[J]. The Journal of China Universities of Posts and Telecommunications,2018,25(6):21-30.
[25] 宋晓琳,盛鑫,曹昊天,等. 基于模仿学习和强化学习的智能车辆换道行为决策[J]. 汽车工程,2021,43(1):59-67. SONG Xiaolin,SHENG Xin,CAO Haotian,et al. Lane- change behavior decision making of intelligent vehicle based on imitation learning and reinforcement learning[J]. Automotive Engineering,2021,43(1):59-67.
[26] NADARAJAH S,KOTZ S. Exact distribution of the max/min of two Gaussian random variables[J]. IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2008,16(2):210-212.
[27] TREIBER M,KESTING A,HELBING D. Delays,inaccuracies and anticipation in microscopic traffic models[J]. Physica A:Statistical Mechanics and its Applications,2006,360(1):71-88.
[28] BEN-YAACOV A,MALTZ M,SHINAR D. Effects of an in-vehicle collision avoidance warning system on short-and long-term driving performance[J]. Human Factors,2002,44(2):335-342.
[29] 郭景华,李文昌,罗禹贡,等. 基于深度强化学习的驾驶员跟车模型研究[J]. 汽车工程,2021,43(4):571-579. GUO Jinghua,LI Wenchang,LUO Yugong,et al. Driver car-following model based on deep reinforcement learning[J]. Automotive Engineering,2021,43(4):571-579.
[30] OLSON P L,SIVAK M. Perception-response time to unexpected roadway hazards[J]. Human Factors,1986,28(1):91-96.
[31] PUNZO V,BORZACCHIELLO M T,CIUFFO B. On the assessment of vehicle trajectory data accuracy and application to the Next Generation SIMulation (NGSIM) program data[J]. Transportation Research Part C:Emerging Technologies,2011,19(6):1243-1262.

TD3算法改进与自动驾驶汽车并道策略学习

TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王无印, 黄子钊, 庄子龙, 方怀瑾, 秦威. 基于深度强化学习的自动化码头堆场场桥调度方法[J]. 机械工程学报, 2024, 60(6): 44-57.
[2]	赵阔, 王皂琦, 潘臻信, 潘扬华, 张中飞, 屈挺. 大数据驱动的快消品终端拜访“云-边”联动决策与优化[J]. 机械工程学报, 2024, 60(6): 58-68.
[3]	郭景华, 李文昌, 王班, 王靖瑶. 基于深度强化学习的网联混合动力汽车队列控制[J]. 机械工程学报, 2024, 60(2): 262-271.
[4]	隗寒冰, 吴化腾, 徐进. 考虑驾驶员NMS特征的自动驾驶汽车人机共驾鲁棒横向控制[J]. 机械工程学报, 2024, 60(16): 280-290.
[5]	李文礼, 张祎楠, 石晓辉, 王梦昕. 基于博弈论的右转无信号交叉口行人行为模拟[J]. 机械工程学报, 2024, 60(10): 86-101.
[6]	戢杨杰, 张馨雨, 杨紫茹, 周上航, 黄岩军, 曹建永, 熊璐, 余卓平. 多智能网联汽车轨迹规划：现状与展望[J]. 机械工程学报, 2024, 60(10): 129-146.
[7]	曾迪, 郑玲, 李以农, 杨显通. 自动驾驶奖励函数贝叶斯逆强化学习方法[J]. 机械工程学报, 2024, 60(10): 245-260.
[8]	杨硕, 李时珍, 赵中原, 黄小鹏, 黄岩军. 基于时序差分学习模型预测控制的一体化自动驾驶换道策略[J]. 机械工程学报, 2024, 60(10): 329-338.
[9]	郭洪飞, 陆鑫宇, 任亚平, 张超勇, 李建庆. 基于强化学习的群体进化算法求解双边多目标同步并行拆解线平衡问题[J]. 机械工程学报, 2023, 59(7): 355-366.
[10]	郑湃, 李成熙, 殷悦, 张荣, 鲍劲松, 王柏村, 谢海波, 王力翚. 增强现实辅助的互认知人机安全交互系统[J]. 机械工程学报, 2023, 59(6): 173-184.
[11]	娄山河, 冯毅雄, 郑浩, 胡炳涛, 洪兆溪, 谭建荣. 模拟人脑工作机制的机械产品概念设计行为原理启发求解方法[J]. 机械工程学报, 2023, 59(24): 344-358.
[12]	马丽莹, 魏云冰. 基于DDPG算法的发电企业报价策略研究[J]. 电气工程学报, 2023, 18(2): 192-200.
[13]	林歆悠, 叶卓明, 周斌豪. 基于DQN强化学习的自动驾驶转向控制策略[J]. 机械工程学报, 2023, 59(16): 315-324.
[14]	钟沛成, 骆德渊, 庞明君. 基于深度强化学习的四足机器人跟随策略研究及系统实现[J]. 机械工程学报, 2023, 59(13): 79-88.
[15]	顾文斌, 李育鑫, 刘斯麒, 苑明海, 裴凤雀. 数据驱动的智慧车间实时调度方法研究[J]. 机械工程学报, 2023, 59(12): 47-61.