基于时序差分学习模型预测控制的一体化自动驾驶换道策略

doi:10.3901/JME.2024.10.329

机械工程学报 ›› 2024, Vol. 60 ›› Issue (10): 329-338.doi: 10.3901/JME.2024.10.329

扫码分享

基于时序差分学习模型预测控制的一体化自动驾驶换道策略

杨硕¹, 李时珍¹, 赵中原², 黄小鹏³, 黄岩军¹

1. 同济大学汽车学院上海 201804;
2. 南京信息工程大学自动化学院南京 210044;
3. 中电科东方通信集团有限公司广东 519060

收稿日期:2023-08-15 修回日期:2023-12-26 出版日期:2024-05-20 发布日期:2024-07-24
作者简介:杨硕,男,1995年出生,博士研究生。主要研究方向为自动驾驶汽车,强化学习,智能交通系统和车辆动力学。
E-mail:yangshuo_jlu@163.com
黄岩军(通信作者),男,1986年出生,博士,教授,博士研究生导师。主要研究方向为自动驾驶和人工智能相结合的决策规划、运动控制、人机合作驾驶。
E-mail:yanjun_huang@tongji.edu.cn
基金资助:
国家自然科学基金-企业创新发展联合基金资助项目(U23B2061)。

Integrated Autonomous Driving Lane Change Policy Based on Temporal Difference Learning Model Predictive Control

YANG Shuo¹, LI Shizhen¹, ZHAO Zhongyuan², HUANG Xiaopeng³, HUANG Yanjun¹

1. School of Automotive Studies, Tongji University, Shanghai 201804;
2. College of Automation, Nanjing University of Information Science and Technology, Nanjing 210044;
3. China Electronics Technology Eastern Communication Group Co., Ltd., Guangdong 519060

Received:2023-08-15 Revised:2023-12-26 Online:2024-05-20 Published:2024-07-24

摘要/Abstract

摘要： 具有自进化能力的自动驾驶换道策略有望在复杂开放的交通环境中提升性能，以应对更多的未知场景。时序差分学习模型预测控制(Temporal difference learning for model predictive control, TD-MPC)结合有模型和无模型强化学习方法的优势，具有学习效率高、性能优异的特点。基于此，为了提高自动驾驶换道策略的整体性能，提出基于TD-MPC的自动驾驶一体化换道策略。具体来说，针对自动换道问题，提出基于驾驶倾向网络的一体化自动驾驶换道策略架构，构建强化学习问题并设计完备的奖励函数，对决策规划优化问题进行统一求解。应用TD-MPC算法设计内部模型来预测未来状态和奖励，实现短时域内的局部轨迹优化，同时使用时序差分学习实现对长期汇报的估计，以得到驾驶倾向网络参数。所提出方法在高保真仿真环境中被验证，结果表明，所提出方法相比规则方案保证行驶效率，并且提高安全性和舒适性。同时与软演员-评论家算法(Soft actor critic, SAC)相比，实现了7～9倍的学习效率提升。

关键词: 自动驾驶, 强化学习, 一体化决策规划

Abstract: Autonomous vehicles are expected to achieve self-evolution in the real-world environment to gradually cover more complex and changing scenarios. Temporal difference learning for model predictive control(TD-MPC) combines the advantages of model-free and model-free reinforcement learning methods, and has the characteristics of high learning efficiency and excellent performance. Based on this, in order to improve the overall performance of the automated lane change policy, an integrated automated lane change method based on TD-MPC is proposed. Specifically, an integrated architecture based on driving propensity network is proposed. The reinforcement learning problem is constructed and a complete reward function is designed to solve the decision planning optimization problem in a unified way. The TD-MPC algorithm is used to design an internal model to predict the future state and reward, so as to realize the local trajectory optimization in the short time domain. At the same time, the temporal difference learning is used to estimate the long-term report to obtain the parameters of the driving tendency network. The proposed method is verified in a high-fidelity simulation environment. The results show that compared with the regular scheme, the proposed method ensures driving efficiency, and improves safety and comfort. At the same time, compared with the soft actor critic(SAC) algorithm, the learning efficiency is improved by 7 to 9 times.

Key words: autonomous driving, reinforcement learning, integrated decision making and planning

中图分类号:

U461

杨硕, 李时珍, 赵中原, 黄小鹏, 黄岩军. 基于时序差分学习模型预测控制的一体化自动驾驶换道策略[J]. 机械工程学报, 2024, 60(10): 329-338.

YANG Shuo, LI Shizhen, ZHAO Zhongyuan, HUANG Xiaopeng, HUANG Yanjun. Integrated Autonomous Driving Lane Change Policy Based on Temporal Difference Learning Model Predictive Control[J]. Journal of Mechanical Engineering, 2024, 60(10): 329-338.

参考文献

[1] 林歆悠，叶卓明，周斌豪. 基于DQN强化学习的自动驾驶转向控制策略[J]. 机械工程学报，2023，59(16):315-324. LIN Xinyou，YE Zhuoming，ZHOU Binhao. DQN reinforcement learning-based steering control strategy for autonomous driving[J]. Journal of Mechanical Engineering，2023，59(16):315-324.
[2] WANG Liwen，YANG Shuo，YUAN Kang，et al. A Combined reinforcement learning and model predictive control for car-following maneuver of autonomous vehicles[J]. Chinese Journal of Mechanical Engineering，2023，36(1):80.
[3] JASWANTH M，NARAYANA N K L，RAHUL S，et al. Autonomous car controller using behaviour planning based on finite state machine[C]//2022 6th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE，2022:296-302.
[4] WANG Xuanyu，QI Xudong，WANG Ping，et al. Decision making framework for autonomous vehicles driving behavior in complex scenarios via hierarchical state machine[J]. Autonomous Intelligent Systems，2021，1:1-12.
[5] ZHANG Chaoyong，CHU Duanfeng，LIU Shidong，et al. Trajectory planning and tracking for autonomous vehicle based on state lattice and model predictive control[J]. IEEE Intelligent Transportation systems magazine，2019，11(2):29-40.
[6] MCNAUGHTON M，URMSON C，DOLAN J M，et al. Motion planning for autonomous driving with a conformal spatiotemporal lattice[C]//2011 IEEE International Conference on Robotics and Automation. IEEE，2011:4889-4895.
[7] KATRAKAZAS C，QUDDUS M，CHEN W H，et al. Real-time motion planning methods for autonomous on-road driving:State-of-the-art and future research directions[J]. Transportation Research Part C:Emerging Technologies，2015，60:416-442.
[8] JI Jie，KHAJEPOUR A，MELEK W W，et al. Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints[J]. IEEE Transactions on Vehicular Technology，2016，66(2):952-964.
[9] LI Bai，ZHANG Youmin，FENG Yiheng，et al. Balancing computation speed and quality:A decentralized motion planning method for cooperative lane changes of connected and automated vehicles[J]. IEEE Transactions on Intelligent Vehicles，2018，3(3):340-350.
[10] 杜荣华，胡鸿飞，高凯，等. 基于变预测时域MPC的自动驾驶汽车轨迹跟踪控制研究[J]. 机械工程学报，2022，58(24):275-288. DU Ronghua，HU Hongfei，GAO Kai，etal. Research on trajectory tracking control of autonomous vehicle based on MPC with variable predictive horizon[J]. Journal of Mechanical Engineering，2022，58(24):275-288.
[11] 高振海，闫相同，高菲. 基于逆向强化学习的纵向自动驾驶决策方法[J]. 汽车工程，2022，44(7):969-975. GAO Zhenhai，YAN Xiangtong，GAO Fei. A decision-making method for longitudinal autonomous driving based on inverse reinforcement learning[J]. Automotive Engineering，2022，44(7):969-975.
[12] SEWAK M. Deep reinforcement learning[M]. Singapore:Springer Singapore，2019.
[13] AL-SHARMAN M，DEMPSTER R，DAOUD M A，et al. Self-learned autonomous driving at unsignalized intersections:A hierarchical reinforced learning approach for feasible decision-making[J]. IEEE Transactions on Intelligent Transportation Systems，2023，24(11):12345-1235.
[14] LI Guofa，YANG Yifan，LI Shen，et al. Decision making of autonomous vehicles in lane change scenarios:Deep reinforcement learning approaches with risk awareness[J]. Transportation research part C:Emerging Technologies，2022，134:103452.
[15] 张志勇，黄大洋，黄彩霞，等. TD3算法改进与自动驾驶汽车并道策略学习[J]. 机械工程学报，2023，59(8):224-234. ZHANG Zhiyong，HUANG Dayang，HUANG Caixia，et al. TD3 algorithm improving and lane-merging strategy learning for autonomous vehicles[J]. Journal of Mechanical Engineering，2023，59(8):224-234.
[16] 杨惟轶，白辰甲，蔡超，等. 深度强化学习中稀疏奖励问题研究综述[J]. 计算机科学，2020，47(3):182-191. YANG Weiyi，BAI Chenjia，CAI Chao，et al. Survey on sparse reward in deep reinforcement learning[J]. Computer Science，2020，47(3):182-191.
[17] HUBMANN C，BECKER M，ALTHOFF D，et al. Decision making for autonomous driving considering interaction and uncertain prediction of surrounding vehicles[C]//2017 IEEE Intelligent Vehicles Symposium (IV). IEEE，2017:1671-1678.
[18] 张珂，刘畅，兰鹏宇. 基于改进人工势场法的局部路径规划[J]. 汽车文摘，2021(7):59-62. ZHANG Ke，LIU Chang，LAN Pengyu. Local path planning based on improved artificial potential field method [J]. Automotive Digest，2021(7):59-62.
[19] HANSEN N，WANG Xiaolong，SU Hao. Temporal difference learning for model predictive control[J/OL]. arXiv Preprint arXiv:2203.04955，[2022-03-09]. https://arxiv.org/abs/2203.04955.
[20] WILLIAMS G，ALDRICH A，THEODOROU E. Model predictive path integral control using covariance variable importance sampling[J]. arXiv Preprint arXiv:1509.01149，2015.
[21] MOGHADAM M，ALIZADEH A，TEKIN E，et al. An end-to-end deep reinforcement learning approach for the long-term short-term planning on the frenet space[J].
arXiv Preprint arXiv:2011.13098，2020.
[22] DOSOVITSKIY A，ROS G，CODEVILLA F，et al. CARLA:An open urban driving simulator[C]//Conference on Robot Learning. PMLR，2017:1-16.
[23] HAARNOJA T，ZHOU A，HARTIKAINEN K，et al. Soft actor-critic algorithms and applications[J/OL]. arXiv Preprint arXiv:1812.05905，[2018-12-13]. https://arxiv.org/abs/1812.05905.
[24] HAARNOJA T，ZHOU A，ABBEEL P，et al. Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning. PMLR，2018:1861-1870.
[25] HOU Yuenan，LIU Lifeng，WEI Qing，et al. A novel DDPG method with prioritized experience replay[C]//2017 IEEE International Conference on Systems，Man，and Cybernetics (SMC). IEEE，2017:316-321.

基于时序差分学习模型预测控制的一体化自动驾驶换道策略

Integrated Autonomous Driving Lane Change Policy Based on Temporal Difference Learning Model Predictive Control

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王无印, 黄子钊, 庄子龙, 方怀瑾, 秦威. 基于深度强化学习的自动化码头堆场场桥调度方法[J]. 机械工程学报, 2024, 60(6): 44-57.
[2]	赵阔, 王皂琦, 潘臻信, 潘扬华, 张中飞, 屈挺. 大数据驱动的快消品终端拜访“云-边”联动决策与优化[J]. 机械工程学报, 2024, 60(6): 58-68.
[3]	郭景华, 李文昌, 王班, 王靖瑶. 基于深度强化学习的网联混合动力汽车队列控制[J]. 机械工程学报, 2024, 60(2): 262-271.
[4]	褚端峰, 刘鸿祥, 高博麟, 王金湘, 殷国栋. 车辆队列预测巡航控制研究综述[J]. 机械工程学报, 2024, 60(18): 218-246.
[5]	隗寒冰, 吴化腾, 徐进. 考虑驾驶员NMS特征的自动驾驶汽车人机共驾鲁棒横向控制[J]. 机械工程学报, 2024, 60(16): 280-290.
[6]	李文礼, 张祎楠, 石晓辉, 王梦昕. 基于博弈论的右转无信号交叉口行人行为模拟[J]. 机械工程学报, 2024, 60(10): 86-101.
[7]	高镇海, 于桐, 孙天骏. 考虑社会性行为的自动驾驶运动规划研究综述[J]. 机械工程学报, 2024, 60(10): 112-128.
[8]	戢杨杰, 张馨雨, 杨紫茹, 周上航, 黄岩军, 曹建永, 熊璐, 余卓平. 多智能网联汽车轨迹规划：现状与展望[J]. 机械工程学报, 2024, 60(10): 129-146.
[9]	梁凯冲, 赵治国, 颜丹姝, 赵坤. 基于动态运动基元的车辆高速公路换道轨迹规划[J]. 机械工程学报, 2024, 60(10): 192-206.
[10]	周洪龙, 裴晓飞, 刘一平, 赵柯帆. 面向动态不确定场景的自动驾驶车辆时空耦合分层轨迹规划研究[J]. 机械工程学报, 2024, 60(10): 222-234.
[11]	曾迪, 郑玲, 李以农, 杨显通. 自动驾驶奖励函数贝叶斯逆强化学习方法[J]. 机械工程学报, 2024, 60(10): 245-260.
[12]	聂士达, 刘辉, 廖志昊, 谢雨佳, 项昌乐, 韩立金, 林思豪. 考虑复杂地形的越野环境无人车辆路径规划研究[J]. 机械工程学报, 2024, 60(10): 261-272.
[13]	张志勇, 黄大洋, 黄彩霞, 胡林, 杜荣华. TD3算法改进与自动驾驶汽车并道策略学习[J]. 机械工程学报, 2023, 59(8): 224-234.
[14]	郭洪飞, 陆鑫宇, 任亚平, 张超勇, 李建庆. 基于强化学习的群体进化算法求解双边多目标同步并行拆解线平衡问题[J]. 机械工程学报, 2023, 59(7): 355-366.
[15]	郑湃, 李成熙, 殷悦, 张荣, 鲍劲松, 王柏村, 谢海波, 王力翚. 增强现实辅助的互认知人机安全交互系统[J]. 机械工程学报, 2023, 59(6): 173-184.