• CN: 11-2187/TH
  • ISSN: 0577-6686

Journal of Mechanical Engineering ›› 2024, Vol. 60 ›› Issue (10): 329-338.doi: 10.3901/JME.2024.10.329

Previous Articles     Next Articles

Integrated Autonomous Driving Lane Change Policy Based on Temporal Difference Learning Model Predictive Control

YANG Shuo1, LI Shizhen1, ZHAO Zhongyuan2, HUANG Xiaopeng3, HUANG Yanjun1   

  1. 1. School of Automotive Studies, Tongji University, Shanghai 201804;
    2. College of Automation, Nanjing University of Information Science and Technology, Nanjing 210044;
    3. China Electronics Technology Eastern Communication Group Co., Ltd., Guangdong 519060
  • Received:2023-08-15 Revised:2023-12-26 Online:2024-05-20 Published:2024-07-24

Abstract: Autonomous vehicles are expected to achieve self-evolution in the real-world environment to gradually cover more complex and changing scenarios. Temporal difference learning for model predictive control(TD-MPC) combines the advantages of model-free and model-free reinforcement learning methods, and has the characteristics of high learning efficiency and excellent performance. Based on this, in order to improve the overall performance of the automated lane change policy, an integrated automated lane change method based on TD-MPC is proposed. Specifically, an integrated architecture based on driving propensity network is proposed. The reinforcement learning problem is constructed and a complete reward function is designed to solve the decision planning optimization problem in a unified way. The TD-MPC algorithm is used to design an internal model to predict the future state and reward, so as to realize the local trajectory optimization in the short time domain. At the same time, the temporal difference learning is used to estimate the long-term report to obtain the parameters of the driving tendency network. The proposed method is verified in a high-fidelity simulation environment. The results show that compared with the regular scheme, the proposed method ensures driving efficiency, and improves safety and comfort. At the same time, compared with the soft actor critic(SAC) algorithm, the learning efficiency is improved by 7 to 9 times.

Key words: autonomous driving, reinforcement learning, integrated decision making and planning

CLC Number: