具有光滑-直行功能的Q-Learning路径优化算法

doi:10.3901/JME.2022.11.072

机械工程学报 ›› 2022, Vol. 58 ›› Issue (11): 72-87.doi: 10.3901/JME.2022.11.072

扫码分享

具有光滑-直行功能的Q-Learning路径优化算法

段书用¹, 章霖鑫¹, 韩旭¹, 刘桂荣²

1. 省部共建电工装备可靠性与智能化国家重点实验室(河北工业大学) 天津 300401;
2. 辛辛那提大学航空工程和机械工程系辛辛那提 45221 美国

收稿日期:2021-07-22 修回日期:2022-01-24 出版日期:2022-06-05 发布日期:2022-08-08
通讯作者: 韩旭(通信作者),男,1968年出生,博士,教授,博士研究生导师。主要研究方向为复杂装备及系统可靠性、计算反求技术、优化理论与算法。E-mail:xhan@hebut.edu.cn
作者简介:段书用,女,1984年出生,博士,副教授,硕士研究生导师。主要研究方向为机器人动力学、机器人可靠性、计算反求技术。E-mail:duanshuyong@hebut.edu.cn;章霖鑫,男,1997年出生,硕士研究生。主要研究方向为移动机器人路径规划。E-mail:zlinxin@126.com;刘桂荣,男,1958年出生,博士,教授,博士研究生导师。主要研究方向为计算反求技术、机器人智能感知、智能算法、优化理论与算法。E-mail:xingtianliu@sjtu.edu.cn
基金资助:
国家自然科学基金(52175222)、河北省重点研发计划(19227208D)和天津市科技计划项目(19ZXZNGX00100)资助项目

Smoothed-shortcut Q-Learning Algorithm for Optimal Robot Agent Path Planning

DUAN Shuyong¹, ZHANG Linxin¹, HAN Xu¹, LIU Guirong²

1. State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401;
2. Aeronautical Engineering and Mechanical Engineering, University of Cincinnati, Cincinnati 45221, USA

Received:2021-07-22 Revised:2022-01-24 Online:2022-06-05 Published:2022-08-08

摘要/Abstract

摘要： 移动机器人作业路径的合理规划是其安全高效完成作业任务的关键。现有的路径规划算法大部分是基于已知全局环境信息后，再进行路径规划。因此，针对移动机器人在静态未知环境中的路径规划问题，提出了一种具有光滑-直行功能的Q-Learning (SSQL)算法并将其用于移动机器人的路径规划中。该算法在提高Agent学习效率的同时可确保路径为光滑连续的最短曲线，以改善其行走动力学性能及效率。SSQL算法包括三个主要新方案：首先，基于Q-Learning算法对未知环境进行预探索，在Agent首次找到目标点后，依据预探索信息，构建虚拟矩形环境，并在其内部增加引导Q值，以提高Agent学习效率。同时，将Agent找到的路径进行跳点优化，以达到消除冗余路径、减少路径转折点和缩短路径长度的目的。进而，在路径转折位置采用贝塞尔曲线进行路径平滑处理，并使最终路径能满足移动机器人动力学约束。将该算法与Q-Learning算法在不同环境下进行对比试验，研究结果表明，SSQL路径规划算法对大型未知环境的探索表现出优异的优化效果，具有收敛速度快，规划的路径短、转折点少等优点，且能确保移动机器人沿规划路径作业的平滑性和安全性。

关键词: 移动机器人, Q-Learning, 引导Q值, 跳点优化, 贝塞尔曲线, 路径平滑

Abstract: Quality path planning for a mobile robot in operation is the key to completing the task safely, efficiently and smoothly. Such a path planning often needs to be done base only on a given environment that is unknown to the Agent at the beginning, and an effective reinforcement learning is required. Smoothed-shortcut Q-learning (SSQL) Algorithm is presented that enable the Agent to learn and then figure out a smoothed short-cut path to the final goal that is initially unknown to the Agent in a given environment. The SSQL is proposed to solve practical problems for mobile robots effectively arrive at its goal in a strange environment, with a path that is a smooth and continuous curve of shortest distance. The SSQL algorithm consists three major ingredients. First, a virtual rectangular environment boundary of the environment is constructed, based on the pre-explored information. The Q values of guidance point for the virtual rectangular environment are increased to improve the learning efficiency of the Agent. Second, the path found by the Agent at the current time is then optimized by finding short-cuts along the path to eliminate the possible redundant paths and reduce the zig-zag segments, minimizing the total distance between the starting and target point. Third, at the turning positions on the path, the Bezier curve is used to further smooth the path, so as to improve the dynamics for the movement of the robot agent. The final path generated by our SSQL algorithm will be optimal in terms of fast convergence, smoothness and shortest distance. The SSQL algorithm is tested by comparison with the standard Q-Learning algorithm in different environments with various obstacle densities and learning rate. The results show that our SSQL algorithm has indeed achieved fast convergence, short and smooth paths, and with few turning points.

Key words: mobile robot, Q-Learning, Q value of the guidance, jump point optimization, Bezier curve, smooth path

中图分类号:

TP242

段书用, 章霖鑫, 韩旭, 刘桂荣. 具有光滑-直行功能的Q-Learning路径优化算法[J]. 机械工程学报, 2022, 58(11): 72-87.

DUAN Shuyong, ZHANG Linxin, HAN Xu, LIU Guirong. Smoothed-shortcut Q-Learning Algorithm for Optimal Robot Agent Path Planning[J]. Journal of Mechanical Engineering, 2022, 58(11): 72-87.

参考文献

[1] MAC T T, COPOT C, TRAN D T, et al. Heuristic approaches in robot path planning:A survey[J]. Robotics and Autonomous Systems, 2016, 86:13-28.
[2] DIJKSTRA E W. A note on two problems in connexion with graphs[J]. Numerische mathematik, 1959, 1(1):269-271.
[3] NEPOMNIASCHAYA A S, DVOSKINA M A. A simple implementation of Dijkstra's shortest path algorithm on associative parallel processors[J]. Fundamenta Informaticae, 2000, 43(1-4):227-243.
[4] HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE transactions on Systems Science and Cybernetics, 1968, 4(2):100-107.
[5] ZHAO Xiao, WANG Zheng, HUANG Chengkan, et al. Mobile robot path planning based on an improved A* algorithm[J]. Robot, 2018, 40(6):903-910. 赵晓, 王铮, 黄程侃, 等. 基于改进A*算法的移动机器人路径规划[J]. 机器人, 2018, 40(6):903-910.
[6] DUAN Shuyong, WANG Qifan, HAN Xu, et al. An improved A-star algorithm for safety insured optimal path with smoothed corner turns[J]. Journal of Mechanical Engineering, 2020, 56(18):205-215. 段书用, 王启帆, 韩旭, 等. 具有确保安全距离的A*路径优化方法[J]. 机械工程学报, 2020, 56(18):205-215.
[7] HARABOR D, GRASTIEN A. Online graph pruning for pathfinding on grid maps[C]//25th AAAI Conference on Artificial Intelligence and the 23rd Innovative Applications of Artificial Intelligence Conference, Menlo Park, USA:AAAI, 2011:1114-1119.
[8] HARABOR D, GRASTIEN A. Improving jump point search[C]//24th International Conference on Automated Planning and Scheduling, Menlo Park, USA:AAAI, 2014:128-135.
[9] KHATIB O. Real-time obstacle avoidance system for manipulators and mobile robots[J]. International Journal of Robotics Research, 1986, 5(1):90-98.
[10] LEI Jingtao, WANG Yang, CHENG Liya, et al. Safety strategy of fracture reduction robot based on the envelope error of reduction path and improved artificial force field method[J]. Journal of Mechanical Engineering, 2020, 56(1):9-19. 雷静桃, 王洋, 程利亚, 等. 基于复位路径包络误差和改进人工势力场法的复位机器人安全策略[J]. 机械工程学报, 2020, 56(1):9-19.
[11] LIN H I, YANG C S. 2D-span resampling of bi-RRT in dynamic path planning[J]. International Journal of Automation and Smart Technology, 2015, 5(1):39-48.
[12] DU M, MEI T, CHEN J J, et al. RRT-based motion planning algorithm for intelligent vehicle in complex environments[J]. Robot, 2015, 37(4):443-450.
[13] DORIGO M, GAMBARDELLA L M. Ant colony system:A cooperative learning approach to the traveling salesman problem[J]. IEEE Transactions on evolutionary computation, 1997, 1(1):53-66.
[14] WANG Xuewu, TANG Bin, GU Xingsheng. Research on obstacle avoidance strategy for welding robot[J]. Journal of Mechanical Engineering, 2019, 55(17):77-84. 王学武, 汤彬, 顾幸生. 焊接机器人避障策略研究[J]. 机械工程学报, 2019, 55(17):77-84.
[15] WANG G, LIU X, ZHAO Y, et al. Neural network-based adaptive motion control for a mobile robot with unknown longitudinal slipping[J]. Chinese Journal of Mechanical Engineering, 2019, 32(1):1-9.
[16] CAI Y, WANG H, CHEN X, et al. Vehicle detection based on visual saliency and deep sparse convolution hierarchical model[J]. Chinese Journal of Mechanical Engineering, 2016, 29(4):765-772.
[17] CHAKOLE J B, KOLHE M S, MAHAPURUSH G D, et al. A Q-learning agent for automated trading in equity stock markets[J]. Expert Systems with Applications, 2021, 163:113761.
[18] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292.
[19] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]. Robotics and Autonomous Systems, 2019, 115:143-161.
[20] MARTHI B. Automatic shaping and decomposition of reward functions[C]//Proceedings of the 24th International Conference on Machine Learning. USA:ACM, 2007:601-608.
[21] BIANCHI R A C, RIBEIRO C H C, COSTA A H R. Accelerating autonomous learning by using heuristic selection of actions[J]. Journal of Heuristics, 2008, 14(2):135-168.
[22] YU Naigong, MO Fanfan. Mobile robot path planning based on deep auto-encoder and Q-learning[J]. Journal of Beijing University of Technology, 2016, 42(5):668-673. 于乃功, 默凡凡. 基于深度自动编码器与Q学习的移动机器人路径规划方法[J]. 北京工业大学学报, 2016, 42(5):668-673.
[23] CHANG Baoxian, DING Jie, ZHU Junwu, et al. Robot Q-learning coverage algorithm in unknown environments[J]. Journal of Nanjing University of Science and Technology, 2013, 37(6):792-812. 常宝娴, 丁洁, 朱俊武, 等. 未知环境下机器人Q学习覆盖算法[J]. 南京理工大学学报, 2013, 37(6):792-812.
[24] GAO Le, MA Tianlu, LIU Kai, et al. Application of improved Q-Learning algorithm in path planning[J]. Journal of Jilin University (Information Science Edition), 2019, 36(4):439-443. 高乐, 马天录, 刘凯, 等. 改进Q-Learning算法在路径规划中的应用[J]. 吉林大学学报(信息科学版), 2019, 36(4):439-443.
[25] LI Ting. Research of path planning algorithm based on reinforcement learning[D]. Jinlin:Jilin University, 2020. 李婷. 基于强化学习的路径规划算法研究[D]. 吉林:吉林大学, 2020。
[26] WANG Chengbo, ZHANG Xinyu, ZOU Zhiqiang, et al. On path planning of unmanned ship based on Q-Learning[J]. Ship & Ocean Engineering, 2018, 47(5):168-171. 王程博, 张新宇, 邹志强, 等. 基于Q-Learning的无人驾驶船舶路径规划[J]. 船海工程, 2018, 47(5):168-171.
[27] WEI Yuliang, JIN Wuyin. Intelligent vehicle path planning based on neural network Q-learning algorithm[J]. Fire Control & Command Control, 2019, 44(2):46-49. 卫玉梁, 靳伍银. 基于神经网络Q-learning算法的智能车路径规划[J]. 火力与指挥控制, 2019, 44(2):46-49.
[28] HU Yanming, LI Decai, HE Yuqing, et al. Q-Learning algorithm based on incremental RBF network[J]. Robot, 2019, 41(5):562-573. 胡艳明, 李德才, 何玉庆, 等. 基于增量式RBF网络的Q学习算法[J]. 机器人, 2019, 41(5):562-573.
[29] SHI Qun, LÜ Lei, XIE Jiajun. Intelligent posture control of humanoid robot in variable environment[J]. Journal of Mechanical Engineering, 2020, 56(3):64-72. 施群, 吕雷, 谢家骏. 可变环境下仿人机器人智能姿态控制[J]. 机械工程学报, 2020, 56(3):64-72.
[30] RUAN Xiaogang, LIU Pengfei, ZHU Xiaoqing. Q-Learning environment recognition method base in odor-reward shaping[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(3):254-260. 阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3):254-260.
[31] CUI Genqun, HU Kerun, TANG Fengmin. Intelligent vehicle path planning based on genetic algorithm and Bézier curve[J]. Modern Electronics Technique, 2021, 44(1):144-148. 崔根群, 胡可润, 唐风敏. 融合遗传贝塞尔曲线的智能汽车路径规划[J]. 现代电子技术, 2021, 44(1):144-148.
[32] DUAN Jianmin, CHEN Qianglong. Prior knowledge based Q-Learning path planning algorithm[J]. Electronics Optics & Control, 2019, 26(9):29-33. 段建民, 陈强龙. 利用先验知识的Q-Learning路径规划算法研究[J]. 电光与控制, 2019, 26(9):29-33.

具有光滑-直行功能的Q-Learning路径优化算法

Smoothed-shortcut Q-Learning Algorithm for Optimal Robot Agent Path Planning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王洪亮, 雍健羽, 皮大伟, 谢伯元, 王尔烈, 王显会. 基于贝塞尔曲线和多目标优化方法的多车队形切换轨迹规划[J]. 机械工程学报, 2024, 60(16): 270-279.
[2]	宋荆洲, 宫兴龙, 段嘉辰, 张腾飞. 可跳跃移动机器人机构设计与跳跃过程控制研究综述[J]. 机械工程学报, 2024, 60(15): 1-17.
[3]	何华, 刘全, 申屠舒展, 宫昭. 轮式多机协同搬运机器人轨迹跟踪控制器设计[J]. 机械工程学报, 2024, 60(11): 145-155.
[4]	汪步云, 彭稳, 梁艺, 程军, 胡汉春, 许德章. 全地形移动机器人悬架机构设计及特性分析[J]. 机械工程学报, 2022, 58(9): 71-86.
[5]	杨富富, 卢帅龙, 宋亚庆, 张俊, 姚立纲. 基于刚性折纸的新型可折展移动机器人折展原理分析及验证[J]. 机械工程学报, 2022, 58(23): 75-87.
[6]	杨威, 郑玲, 李以农. 基于高斯混合模型的个性化自动驾驶决策控制研究[J]. 机械工程学报, 2022, 58(16): 280-289.
[7]	何雨镐, 谢福贵, 刘辛军, 张旭. 大型构件机器人原位加工中的测量方案概述[J]. 机械工程学报, 2022, 58(14): 1-14.
[8]	杨继之, 乐毅, 张加波, 周莹皓, 赵长喜, 陈钦韬. 移动机器人定位精度实时补偿策略研究[J]. 机械工程学报, 2022, 58(14): 44-53.
[9]	张宾, 丁睿哲, 上官林建, 施进发. 移动并联加工机器人仿真研究[J]. 机械工程学报, 2022, 58(14): 146-153.
[10]	马小陆, 梅宏. 基于改进势场蚁群算法的移动机器人全局路径规划[J]. 机械工程学报, 2021, 57(1): 19-27.
[11]	梁岗, 吴章杰, 於伟坤. 基于四次贝塞尔曲线的起重机转弯非圆曲线轨道优化设计[J]. 机械工程学报, 2020, 56(19): 253-264.
[12]	何妍颖, 李晔卓, 武建昫, 刘兴杰, 姚燕安. 多模式两轮移动机器人的设计与运动分析[J]. 机械工程学报, 2019, 55(23): 83-92.
[13]	郭亮, 张华. 狭小空间不连续折线焊缝识别移动机器人跟踪系统[J]. 机械工程学报, 2019, 55(17): 8-13.
[14]	王雁东, 唐昭, 戴建生. 连杆铰接轮腿式机器人的运动学与步态分析[J]. 机械工程学报, 2018, 54(7): 11-19.
[15]	刘城, 闫清东, 魏巍. 基于贝塞尔曲线的液力变矩器三维叶片造型方法[J]. 机械工程学报, 2017, 53(10): 201-208.