TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles

doi:10.3901/JME.2023.08.224

Abstract

Abstract: To enhance the comprehensive performance of automotive lane-merging, the Q-value estimation method of twin delayed deep deterministic policy gradient（TD3） algorithm and the reward function are improved. The automotive lane-merging model is formalized as the Markov decision process, and the influences of Q-value underestimated by TD3 algorithm on lane-merging strategy are analyzed. A Q-value estimation method based on weighted average of sample variance is proposed to enhance the Q-value estimation accuracy, when two Q-value estimation samples are obtained by performing Monte Carlo dropout on the dual target critic network. With giving priority to the completion of the lane-merging, a more perfect reward function is designed considering the safety,comfort and traffic efficiency. Based on the improved TD3 algorithm and the reward function, a lane-merging strategy of autonomous vehicles is learned and verified with BARK simulator. The results show that the improved TD3 algorithm significantly enhances the accuracy of Q-value estimation. Combined with the established reward function, the safety and ride comfort of lane-merging are improved while ensuring traffic efficiency.

Key words: autonomous vehicle, reinforcement learning, lane-merging strategy, Q-value estimation

CLC Number:

U461

ZHANG Zhi-yong, HUANG Da-yang, HUANG Cai-xia, HU Lin, DU Rong-hua. TD3 Algorithm Improving and Lane-merging Strategy Learning for Autonomous Vehicles[J]. Journal of Mechanical Engineering, 2023, 59(8): 224-234.

References

[1] 唐小林,陈佳信,刘腾,等. 基于深度强化学习的混合动力汽车智能跟车控制与能量管理策略研究[J]. 机械工程学报,2021,57(22):237-246. TANG Xiaolin,CHEN Jiaxin,LIU Teng,et al. Research on deep reinforcement learning-based intelligent car-following control and energy management strategy for hybrid electric vehicles[J]. Journal of Mechanical Engineering,2021,57(22):237-246.
[2] GONZÁLEZ D,PÉREZ J,MILANÉS V,et al. A review of motion planning techniques for automated vehicles[J]. IEEE Transactions on Intelligent Transportation Systems,2015,17(4):1135-1145.
[3] KESTING A,TREIBER M,HELBING D. General lane-changing model MOBIL for car-following models[J]. Transportation Research Record,2007,1999(1):86-94.
[4] KURT A,ÖZGÜNER Ü. Hierarchical finite state machines for autonomous mobile systems[J]. Control Engineering Practice,2013,21(2):184-194.
[5] 熊璐,杨兴,卓桂荣,等. 无人驾驶车辆的运动控制发展现状综述[J]. 机械工程学报,2020,56(10):127-143. XIONG Lu,YANG Xing,ZHUO Guirong,et al. Review on motion control of autonomous vehicles[J]. Journal of Mechanical Engineering,2020,56(10):127-143.
[6] URMSON C,ANHALT J,BAGNELL D,et al. Autonomous driving in urban environments:Boss and the urban challenge[J]. Journal of Field Robotics,2008,25(8):425-466.
[7] SAMAK T V,SAMAK C V,KANDHASAMY S. Robust behavioral cloning for autonomous vehicles using end-to-end imitation learning[J]. SAE International Journal of Connected and Automated Vehicles,2021,4(3):279-295.
[8] PAN Y,CHENG C A,SAIGOL K,et al. Imitation learning for agile autonomous driving[J]. The International Journal of Robotics Research,2020,39(2-3):286-302.
[9] KEBRIA P M,KHOSRAVI A,SALAKEN S M,et al. Deep imitation learning for autonomous vehicles based on convolutional neural networks[J]. IEEE/CAA Journal of Automatica Sinica,2019,7(1):82-95.
[10] CAI P,WANG S,SUN Y,et al. Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion[J]. IEEE Robotics and Automation Letters,2020,5(3):4218-4224.
[11] 刘照麟,陈吉清,兰凤崇,等. 基于轨迹张量的自动驾驶复合信息综合映射方法[J]. 机械工程学报,2020,56(16):214-226. LIU Zhaolin,CHEN Jiqing,LAN Fengchong,et al. Methodology on comprehensive mapping of multi-information of autonomous driving based on trajectory tensor[J]. Journal of Mechanical Engineering,2020,56(16):214-226.
[12] 乔良,鲍泓,玄祖兴,等. 基于强化学习的无人驾驶匝道汇入模型[J]. 计算机工程,2018,44(7):20-24. QIAO Liang,BAO Hong,XUAN Zuxing,et al. Autonomous driving ramp merging model based on reinforcement learning[J]. Computer Engineering,2018,44(7):20-24.
[13] HU H,LU Z,WANG Q,et al. End-to-End automated lane-change maneuvering considering driving style using a deep deterministic policy gradient algorithm[J]. Sensors,2020,20(18):5443.
[14] 王忠立,王浩,申艳,等. 一种多感知多约束奖励机制的驾驶策略学习方法[J/OL]. 吉林大学学报(工学版):1-11[2022-06-03].DOI:10.13229/j.cnki.jdxbgxb20210412. WANG Lizhong,WANG Hao,SHEN Yan,et al. A driving decision-making approach based on multi-sensing and multi-constraints reward function[J]. Journal of Jilin University(Engineering and Technology Edition):1-11[2022-06-03]. DOI:10.13229/j.cnki.jdxbgxb20210412.
[15] FUJIMOTO S,HOOF H,MEGER D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm:PMLR,2018:1587-1596.
[16] LV P,WANG X,CHENG Y,et al. Stochastic double deep q-network[J]. IEEE Access,2019,7:79446-79454.
[17] VAN HASSELT H,GUEZ A,SILVER D. Deep reinforcement learning with double q-learning[C]//Proceedings of the 30th AAAI Conference On Artificial Intelligence. Phoenix:AAAI,2016:2094-2100.
[18] WANG B,LI X,GAO Z,et al. Risk aversion operator for addressing maximization bias in Q-learning[J]. IEEE Access,2020,8:43098-43110.
[19] HAN S,ZHOU W B,LU S,et al. Regularly updated deterministic policy gradient algorithm[J]. Knowledge-Based Systems,2021,214:106736.
[20] SAGLAM B,DURAN E,CICEK D C,et al. Parameter-free deterministic reduction of the estimation bias in continuous control[J]. arXiv preprint arXiv,2021:11788.
[21] SAGLAM B,DURAN E,CICEK D C,et al. Estimation error correction in deep reinforcement learning for deterministic actor-critic methods[C]//The 33rd International Conference on Tools with Artificial Intelligence. Washington DC:IEEE,2021:137-144.
[22] 裴晓飞,莫烁杰,陈祯福,等. 基于TD3算法的人机混驾交通环境自动驾驶汽车换道研究[J]. 中国公路学报,2021,34(11):246-254. PEI Xiaofei,MO Shuojie,CHEN Zhenfu,et al. Lane changing of autonomous vehicle based on TD3 algorithm in human-machine hybrid driving environment[J]. China Journal of Highway and Transport,2021,34(11):246-254.
[23] REN T,XIE Y,JIANG L. Cooperative highway work zone merge control based on reinforcement learning in a connected and automated environment[J]. Transportation Research Record,2020,2674(10):363-374.
[24] SHUN Y,JIAN W,SUMIN Z,et al. Autonomous driving in the uncertain traffic-A deep reinforcement learning approach[J]. The Journal of China Universities of Posts and Telecommunications,2018,25(6):21-30.
[25] 宋晓琳,盛鑫,曹昊天,等. 基于模仿学习和强化学习的智能车辆换道行为决策[J]. 汽车工程,2021,43(1):59-67. SONG Xiaolin,SHENG Xin,CAO Haotian,et al. Lane- change behavior decision making of intelligent vehicle based on imitation learning and reinforcement learning[J]. Automotive Engineering,2021,43(1):59-67.
[26] NADARAJAH S,KOTZ S. Exact distribution of the max/min of two Gaussian random variables[J]. IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2008,16(2):210-212.
[27] TREIBER M,KESTING A,HELBING D. Delays,inaccuracies and anticipation in microscopic traffic models[J]. Physica A:Statistical Mechanics and its Applications,2006,360(1):71-88.
[28] BEN-YAACOV A,MALTZ M,SHINAR D. Effects of an in-vehicle collision avoidance warning system on short-and long-term driving performance[J]. Human Factors,2002,44(2):335-342.
[29] 郭景华,李文昌,罗禹贡,等. 基于深度强化学习的驾驶员跟车模型研究[J]. 汽车工程,2021,43(4):571-579. GUO Jinghua,LI Wenchang,LUO Yugong,et al. Driver car-following model based on deep reinforcement learning[J]. Automotive Engineering,2021,43(4):571-579.
[30] OLSON P L,SIVAK M. Perception-response time to unexpected roadway hazards[J]. Human Factors,1986,28(1):91-96.
[31] PUNZO V,BORZACCHIELLO M T,CIUFFO B. On the assessment of vehicle trajectory data accuracy and application to the Next Generation SIMulation (NGSIM) program data[J]. Transportation Research Part C:Emerging Technologies,2011,19(6):1243-1262.