• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2022, Vol. 58 ›› Issue (11): 72-87.doi: 10.3901/JME.2022.11.072

• 机器人及机构学 • 上一篇    下一篇

扫码分享

具有光滑-直行功能的Q-Learning路径优化算法

段书用1, 章霖鑫1, 韩旭1, 刘桂荣2   

  1. 1. 省部共建电工装备可靠性与智能化国家重点实验室(河北工业大学) 天津 300401;
    2. 辛辛那提大学航空工程和机械工程系 辛辛那提 45221 美国
  • 收稿日期:2021-07-22 修回日期:2022-01-24 出版日期:2022-06-05 发布日期:2022-08-08
  • 通讯作者: 韩旭(通信作者),男,1968年出生,博士,教授,博士研究生导师。主要研究方向为复杂装备及系统可靠性、计算反求技术、优化理论与算法。E-mail:xhan@hebut.edu.cn
  • 作者简介:段书用,女,1984年出生,博士,副教授,硕士研究生导师。主要研究方向为机器人动力学、机器人可靠性、计算反求技术。E-mail:duanshuyong@hebut.edu.cn;章霖鑫,男,1997年出生,硕士研究生。主要研究方向为移动机器人路径规划。E-mail:zlinxin@126.com;刘桂荣,男,1958年出生,博士,教授,博士研究生导师。主要研究方向为计算反求技术、机器人智能感知、智能算法、优化理论与算法。E-mail:xingtianliu@sjtu.edu.cn
  • 基金资助:
    国家自然科学基金(52175222)、河北省重点研发计划(19227208D)和天津市科技计划项目(19ZXZNGX00100)资助项目

Smoothed-shortcut Q-Learning Algorithm for Optimal Robot Agent Path Planning

DUAN Shuyong1, ZHANG Linxin1, HAN Xu1, LIU Guirong2   

  1. 1. State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401;
    2. Aeronautical Engineering and Mechanical Engineering, University of Cincinnati, Cincinnati 45221, USA
  • Received:2021-07-22 Revised:2022-01-24 Online:2022-06-05 Published:2022-08-08

摘要: 移动机器人作业路径的合理规划是其安全高效完成作业任务的关键。现有的路径规划算法大部分是基于已知全局环境信息后,再进行路径规划。因此,针对移动机器人在静态未知环境中的路径规划问题,提出了一种具有光滑-直行功能的Q-Learning (SSQL)算法并将其用于移动机器人的路径规划中。该算法在提高Agent学习效率的同时可确保路径为光滑连续的最短曲线,以改善其行走动力学性能及效率。SSQL算法包括三个主要新方案:首先,基于Q-Learning算法对未知环境进行预探索,在Agent首次找到目标点后,依据预探索信息,构建虚拟矩形环境,并在其内部增加引导Q值,以提高Agent学习效率。同时,将Agent找到的路径进行跳点优化,以达到消除冗余路径、减少路径转折点和缩短路径长度的目的。进而,在路径转折位置采用贝塞尔曲线进行路径平滑处理,并使最终路径能满足移动机器人动力学约束。将该算法与Q-Learning算法在不同环境下进行对比试验,研究结果表明,SSQL路径规划算法对大型未知环境的探索表现出优异的优化效果,具有收敛速度快,规划的路径短、转折点少等优点,且能确保移动机器人沿规划路径作业的平滑性和安全性。

关键词: 移动机器人, Q-Learning, 引导Q值, 跳点优化, 贝塞尔曲线, 路径平滑

Abstract: Quality path planning for a mobile robot in operation is the key to completing the task safely, efficiently and smoothly. Such a path planning often needs to be done base only on a given environment that is unknown to the Agent at the beginning, and an effective reinforcement learning is required. Smoothed-shortcut Q-learning (SSQL) Algorithm is presented that enable the Agent to learn and then figure out a smoothed short-cut path to the final goal that is initially unknown to the Agent in a given environment. The SSQL is proposed to solve practical problems for mobile robots effectively arrive at its goal in a strange environment, with a path that is a smooth and continuous curve of shortest distance. The SSQL algorithm consists three major ingredients. First, a virtual rectangular environment boundary of the environment is constructed, based on the pre-explored information. The Q values of guidance point for the virtual rectangular environment are increased to improve the learning efficiency of the Agent. Second, the path found by the Agent at the current time is then optimized by finding short-cuts along the path to eliminate the possible redundant paths and reduce the zig-zag segments, minimizing the total distance between the starting and target point. Third, at the turning positions on the path, the Bezier curve is used to further smooth the path, so as to improve the dynamics for the movement of the robot agent. The final path generated by our SSQL algorithm will be optimal in terms of fast convergence, smoothness and shortest distance. The SSQL algorithm is tested by comparison with the standard Q-Learning algorithm in different environments with various obstacle densities and learning rate. The results show that our SSQL algorithm has indeed achieved fast convergence, short and smooth paths, and with few turning points.

Key words: mobile robot, Q-Learning, Q value of the guidance, jump point optimization, Bezier curve, smooth path

中图分类号: