Integrated Autonomous Driving Lane Change Policy Based on Temporal Difference Learning Model Predictive Control

doi:10.3901/JME.2024.10.329

Abstract

Abstract: Autonomous vehicles are expected to achieve self-evolution in the real-world environment to gradually cover more complex and changing scenarios. Temporal difference learning for model predictive control(TD-MPC) combines the advantages of model-free and model-free reinforcement learning methods, and has the characteristics of high learning efficiency and excellent performance. Based on this, in order to improve the overall performance of the automated lane change policy, an integrated automated lane change method based on TD-MPC is proposed. Specifically, an integrated architecture based on driving propensity network is proposed. The reinforcement learning problem is constructed and a complete reward function is designed to solve the decision planning optimization problem in a unified way. The TD-MPC algorithm is used to design an internal model to predict the future state and reward, so as to realize the local trajectory optimization in the short time domain. At the same time, the temporal difference learning is used to estimate the long-term report to obtain the parameters of the driving tendency network. The proposed method is verified in a high-fidelity simulation environment. The results show that compared with the regular scheme, the proposed method ensures driving efficiency, and improves safety and comfort. At the same time, compared with the soft actor critic(SAC) algorithm, the learning efficiency is improved by 7 to 9 times.

Key words: autonomous driving, reinforcement learning, integrated decision making and planning

CLC Number:

U461

YANG Shuo, LI Shizhen, ZHAO Zhongyuan, HUANG Xiaopeng, HUANG Yanjun. Integrated Autonomous Driving Lane Change Policy Based on Temporal Difference Learning Model Predictive Control[J]. Journal of Mechanical Engineering, 2024, 60(10): 329-338.

References

[1] 林歆悠，叶卓明，周斌豪. 基于DQN强化学习的自动驾驶转向控制策略[J]. 机械工程学报，2023，59(16):315-324. LIN Xinyou，YE Zhuoming，ZHOU Binhao. DQN reinforcement learning-based steering control strategy for autonomous driving[J]. Journal of Mechanical Engineering，2023，59(16):315-324.
[2] WANG Liwen，YANG Shuo，YUAN Kang，et al. A Combined reinforcement learning and model predictive control for car-following maneuver of autonomous vehicles[J]. Chinese Journal of Mechanical Engineering，2023，36(1):80.
[3] JASWANTH M，NARAYANA N K L，RAHUL S，et al. Autonomous car controller using behaviour planning based on finite state machine[C]//2022 6th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE，2022:296-302.
[4] WANG Xuanyu，QI Xudong，WANG Ping，et al. Decision making framework for autonomous vehicles driving behavior in complex scenarios via hierarchical state machine[J]. Autonomous Intelligent Systems，2021，1:1-12.
[5] ZHANG Chaoyong，CHU Duanfeng，LIU Shidong，et al. Trajectory planning and tracking for autonomous vehicle based on state lattice and model predictive control[J]. IEEE Intelligent Transportation systems magazine，2019，11(2):29-40.
[6] MCNAUGHTON M，URMSON C，DOLAN J M，et al. Motion planning for autonomous driving with a conformal spatiotemporal lattice[C]//2011 IEEE International Conference on Robotics and Automation. IEEE，2011:4889-4895.
[7] KATRAKAZAS C，QUDDUS M，CHEN W H，et al. Real-time motion planning methods for autonomous on-road driving:State-of-the-art and future research directions[J]. Transportation Research Part C:Emerging Technologies，2015，60:416-442.
[8] JI Jie，KHAJEPOUR A，MELEK W W，et al. Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints[J]. IEEE Transactions on Vehicular Technology，2016，66(2):952-964.
[9] LI Bai，ZHANG Youmin，FENG Yiheng，et al. Balancing computation speed and quality:A decentralized motion planning method for cooperative lane changes of connected and automated vehicles[J]. IEEE Transactions on Intelligent Vehicles，2018，3(3):340-350.
[10] 杜荣华，胡鸿飞，高凯，等. 基于变预测时域MPC的自动驾驶汽车轨迹跟踪控制研究[J]. 机械工程学报，2022，58(24):275-288. DU Ronghua，HU Hongfei，GAO Kai，etal. Research on trajectory tracking control of autonomous vehicle based on MPC with variable predictive horizon[J]. Journal of Mechanical Engineering，2022，58(24):275-288.
[11] 高振海，闫相同，高菲. 基于逆向强化学习的纵向自动驾驶决策方法[J]. 汽车工程，2022，44(7):969-975. GAO Zhenhai，YAN Xiangtong，GAO Fei. A decision-making method for longitudinal autonomous driving based on inverse reinforcement learning[J]. Automotive Engineering，2022，44(7):969-975.
[12] SEWAK M. Deep reinforcement learning[M]. Singapore:Springer Singapore，2019.
[13] AL-SHARMAN M，DEMPSTER R，DAOUD M A，et al. Self-learned autonomous driving at unsignalized intersections:A hierarchical reinforced learning approach for feasible decision-making[J]. IEEE Transactions on Intelligent Transportation Systems，2023，24(11):12345-1235.
[14] LI Guofa，YANG Yifan，LI Shen，et al. Decision making of autonomous vehicles in lane change scenarios:Deep reinforcement learning approaches with risk awareness[J]. Transportation research part C:Emerging Technologies，2022，134:103452.
[15] 张志勇，黄大洋，黄彩霞，等. TD3算法改进与自动驾驶汽车并道策略学习[J]. 机械工程学报，2023，59(8):224-234. ZHANG Zhiyong，HUANG Dayang，HUANG Caixia，et al. TD3 algorithm improving and lane-merging strategy learning for autonomous vehicles[J]. Journal of Mechanical Engineering，2023，59(8):224-234.
[16] 杨惟轶，白辰甲，蔡超，等. 深度强化学习中稀疏奖励问题研究综述[J]. 计算机科学，2020，47(3):182-191. YANG Weiyi，BAI Chenjia，CAI Chao，et al. Survey on sparse reward in deep reinforcement learning[J]. Computer Science，2020，47(3):182-191.
[17] HUBMANN C，BECKER M，ALTHOFF D，et al. Decision making for autonomous driving considering interaction and uncertain prediction of surrounding vehicles[C]//2017 IEEE Intelligent Vehicles Symposium (IV). IEEE，2017:1671-1678.
[18] 张珂，刘畅，兰鹏宇. 基于改进人工势场法的局部路径规划[J]. 汽车文摘，2021(7):59-62. ZHANG Ke，LIU Chang，LAN Pengyu. Local path planning based on improved artificial potential field method [J]. Automotive Digest，2021(7):59-62.
[19] HANSEN N，WANG Xiaolong，SU Hao. Temporal difference learning for model predictive control[J/OL]. arXiv Preprint arXiv:2203.04955，[2022-03-09]. https://arxiv.org/abs/2203.04955.
[20] WILLIAMS G，ALDRICH A，THEODOROU E. Model predictive path integral control using covariance variable importance sampling[J]. arXiv Preprint arXiv:1509.01149，2015.
[21] MOGHADAM M，ALIZADEH A，TEKIN E，et al. An end-to-end deep reinforcement learning approach for the long-term short-term planning on the frenet space[J].
arXiv Preprint arXiv:2011.13098，2020.
[22] DOSOVITSKIY A，ROS G，CODEVILLA F，et al. CARLA:An open urban driving simulator[C]//Conference on Robot Learning. PMLR，2017:1-16.
[23] HAARNOJA T，ZHOU A，HARTIKAINEN K，et al. Soft actor-critic algorithms and applications[J/OL]. arXiv Preprint arXiv:1812.05905，[2018-12-13]. https://arxiv.org/abs/1812.05905.
[24] HAARNOJA T，ZHOU A，ABBEEL P，et al. Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning. PMLR，2018:1861-1870.
[25] HOU Yuenan，LIU Lifeng，WEI Qing，et al. A novel DDPG method with prioritized experience replay[C]//2017 IEEE International Conference on Systems，Man，and Cybernetics (SMC). IEEE，2017:316-321.