• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2025, Vol. 62 ›› Issue (6): 302-313.doi: 10.3901/JME.260194

• 运载工程 • 上一篇    

扫码分享

基于深度强化学习和模型预测控制的混合动力电动汽车实时能量管理策略

刘辉1,2, 马小康1, 韩立金1,2, 项昌乐1,2   

  1. 1. 北京理工大学机械与车辆学院 北京 100081;
    2. 北京理工合肥无人智能装备研究院 合肥 230041
  • 收稿日期:2025-05-14 修回日期:2026-01-18 发布日期:2026-05-12
  • 作者简介:刘辉(通信作者),女,1975年出生,博士,教授。主要研究方向为车辆系统动力学和混合动力车辆机电传动。E-mail:lh@bit.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(52130512)。

Real-time Energy Management Strategy for Hybrid Electric Vehicles Based on Deep Reinforcement Learning and Model Predictive Control

LIU Hui1,2, Ma Xiaokang1, HAN Lijin1,2, XIANG Changle1,2   

  1. 1. School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081;
    2. B&H Unmanned Intelligent System Research Institute, Beijing Institute of Technology, Hefei 230041
  • Received:2025-05-14 Revised:2026-01-18 Published:2026-05-12

摘要: 为应对混合动力电动汽车(Hybrid electric vehicle,HEV)能量管理在实时性和适应性之间难以协调的挑战,提出一种结合深度强化学习(Deep reinforcement learning, DRL)和模型预测控制(Model predictive control, MPC)的实时分层能量管理策略(Energy management strategy, EMS)。在上层,利用深度Q网络(Deep Q-network, DQN)构建一种能量管理策略控制器,以便在汽车出发前快速规划电池荷电状态(State of charge, SOC)参考轨迹。在下层,首先利用长短期记忆网络(Long short-term memory, LSTM)构建一种速度预测器,进行未来一定时域范围内的速度序列预测,接着设计一个MPC控制器通过跟踪SOC参考轨迹完成最优功率流分配;然后将所提策略与动态规划(Dynamic programming,DP)、基于规则的策略在不同测试工况下进行综合性能比较,仿真结果表明,所提策略的燃料经济性达到了DP策略的90%以上,同时展现出了良好的实时应用潜力,最后通过硬件在环(Hardware-in-the-Loop,HIL)试验验证了所提策略在实际应用中的潜力。

关键词: 混合电动汽车, 深度Q网络, 模型预测控制, 能量管理策略

Abstract: To address the challenge of balancing real-time performance and adaptability in hybrid electric vehicle (HEV) energy management, this paper proposes a real-time hierarchical energy management strategy (EMS) that integrates deep reinforcement learning (DRL) with model predictive control (MPC). At the upper layer, a deep Q-network (DQN) is employed to construct an EMS controller that rapidly plans a reference trajectory for the state of charge (SOC) prior to vehicle departure. At the lower level, a Long Short-Term Memory (LSTM) network is first employed to construct a velocity predictor, forecasting the velocity sequence over a future time domain. Subsequently, an MPC controller is designed to achieve optimal power flow allocation by tracking the SOC reference trajectory. The proposed strategy is then comprehensively compared with dynamic programming (DP) and rule-based strategies across different test conditions. Simulation results demonstrate that the proposed strategy achieves over 90% of the fuel economy attained by the DP strategy while exhibiting strong real-time application potential. Finally, hardware-in-the-loop (HIL) experiments validate the practical applicability of the proposed strategy.

Key words: hybrid electric vehicle, deep Q-network, model predictive control, energy management strategy

中图分类号: