• CN: 11-2187/TH
  • ISSN: 0577-6686

Journal of Mechanical Engineering ›› 2024, Vol. 60 ›› Issue (10): 245-260.doi: 10.3901/JME.2024.10.245

Previous Articles     Next Articles

Bayesian Inverse Reinforcement Learning-based Reward Learning for Automated Driving

ZENG Di1, ZHENG Ling1,2, LI Yinong1,2, YANG Xiantong1   

  1. 1. College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing 400044;
    2. State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing 400044
  • Received:2023-11-20 Revised:2024-01-25 Online:2024-05-20 Published:2024-07-24

Abstract: Studying driving policies with wide-ranging scenario adaptability is crucial to realizing safe, efficient, and harmonious automated driving. Deep reinforcement learning has shown great potential in driving policy learning with its excellent function approximation and representation capabilities. However, it is extremely challenging to design a reward function suitable for various complex driving scenarios, and driving strategies’ generalization ability needs to be urgently improved. Aiming at the difficulty in designing the reward function, an approximate likelihood model of human drivers’ driving policy is built considering their preferences and a method of learning an approximate posterior distribution over the reward function through sparse action sampling based on curve interpolation and approximate variational inference is proposed, resulting in a Bayesian neural network. Tackling the wrong rewards originate from the uncertainty of a reward function, an uncertainty-aware human-like driving policy learning method based on the posterior distribution over the reward function is proposed, which maximizes the reward while penalizing the epistemic uncertainty. The proposed methods are validated in simulated highway and urban driving scenarios in the NGSIM US-101 and nuPlan datasets. The results show that the proposed method overcomes the problem of poor performance of the reward function based on the linear combination of hand-crafted state features, models the uncertainty of the reward function, and improves the generalization ability of the reward function to high-dimensional nonlinear problems. The learned reward function and the learning stability are significantly better than the mainstream inverse reinforcement learning method. Moreover, penalizing the uncertainty of the reward function improves the human likeness and safety of the driving policy and the training stability. The proposed uncertainty-aware human-like driving policy significantly outperforms the driving policies based on behavior cloning and maximum entropy inverse reinforcement learning.

Key words: intelligent vehicle, automated driving, approximate variational reward learning, approximate variational inference, Bayesian inverse reinforcement learning

CLC Number: