Smoothed-shortcut Q-Learning Algorithm for Optimal Robot Agent Path Planning

doi:10.3901/JME.2022.11.072

Abstract

Abstract: Quality path planning for a mobile robot in operation is the key to completing the task safely, efficiently and smoothly. Such a path planning often needs to be done base only on a given environment that is unknown to the Agent at the beginning, and an effective reinforcement learning is required. Smoothed-shortcut Q-learning (SSQL) Algorithm is presented that enable the Agent to learn and then figure out a smoothed short-cut path to the final goal that is initially unknown to the Agent in a given environment. The SSQL is proposed to solve practical problems for mobile robots effectively arrive at its goal in a strange environment, with a path that is a smooth and continuous curve of shortest distance. The SSQL algorithm consists three major ingredients. First, a virtual rectangular environment boundary of the environment is constructed, based on the pre-explored information. The Q values of guidance point for the virtual rectangular environment are increased to improve the learning efficiency of the Agent. Second, the path found by the Agent at the current time is then optimized by finding short-cuts along the path to eliminate the possible redundant paths and reduce the zig-zag segments, minimizing the total distance between the starting and target point. Third, at the turning positions on the path, the Bezier curve is used to further smooth the path, so as to improve the dynamics for the movement of the robot agent. The final path generated by our SSQL algorithm will be optimal in terms of fast convergence, smoothness and shortest distance. The SSQL algorithm is tested by comparison with the standard Q-Learning algorithm in different environments with various obstacle densities and learning rate. The results show that our SSQL algorithm has indeed achieved fast convergence, short and smooth paths, and with few turning points.

Key words: mobile robot, Q-Learning, Q value of the guidance, jump point optimization, Bezier curve, smooth path

CLC Number:

TP242

DUAN Shuyong, ZHANG Linxin, HAN Xu, LIU Guirong. Smoothed-shortcut Q-Learning Algorithm for Optimal Robot Agent Path Planning[J]. Journal of Mechanical Engineering, 2022, 58(11): 72-87.

References

[1] MAC T T, COPOT C, TRAN D T, et al. Heuristic approaches in robot path planning:A survey[J]. Robotics and Autonomous Systems, 2016, 86:13-28.
[2] DIJKSTRA E W. A note on two problems in connexion with graphs[J]. Numerische mathematik, 1959, 1(1):269-271.
[3] NEPOMNIASCHAYA A S, DVOSKINA M A. A simple implementation of Dijkstra's shortest path algorithm on associative parallel processors[J]. Fundamenta Informaticae, 2000, 43(1-4):227-243.
[4] HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE transactions on Systems Science and Cybernetics, 1968, 4(2):100-107.
[5] ZHAO Xiao, WANG Zheng, HUANG Chengkan, et al. Mobile robot path planning based on an improved A* algorithm[J]. Robot, 2018, 40(6):903-910. 赵晓, 王铮, 黄程侃, 等. 基于改进A*算法的移动机器人路径规划[J]. 机器人, 2018, 40(6):903-910.
[6] DUAN Shuyong, WANG Qifan, HAN Xu, et al. An improved A-star algorithm for safety insured optimal path with smoothed corner turns[J]. Journal of Mechanical Engineering, 2020, 56(18):205-215. 段书用, 王启帆, 韩旭, 等. 具有确保安全距离的A*路径优化方法[J]. 机械工程学报, 2020, 56(18):205-215.
[7] HARABOR D, GRASTIEN A. Online graph pruning for pathfinding on grid maps[C]//25th AAAI Conference on Artificial Intelligence and the 23rd Innovative Applications of Artificial Intelligence Conference, Menlo Park, USA:AAAI, 2011:1114-1119.
[8] HARABOR D, GRASTIEN A. Improving jump point search[C]//24th International Conference on Automated Planning and Scheduling, Menlo Park, USA:AAAI, 2014:128-135.
[9] KHATIB O. Real-time obstacle avoidance system for manipulators and mobile robots[J]. International Journal of Robotics Research, 1986, 5(1):90-98.
[10] LEI Jingtao, WANG Yang, CHENG Liya, et al. Safety strategy of fracture reduction robot based on the envelope error of reduction path and improved artificial force field method[J]. Journal of Mechanical Engineering, 2020, 56(1):9-19. 雷静桃, 王洋, 程利亚, 等. 基于复位路径包络误差和改进人工势力场法的复位机器人安全策略[J]. 机械工程学报, 2020, 56(1):9-19.
[11] LIN H I, YANG C S. 2D-span resampling of bi-RRT in dynamic path planning[J]. International Journal of Automation and Smart Technology, 2015, 5(1):39-48.
[12] DU M, MEI T, CHEN J J, et al. RRT-based motion planning algorithm for intelligent vehicle in complex environments[J]. Robot, 2015, 37(4):443-450.
[13] DORIGO M, GAMBARDELLA L M. Ant colony system:A cooperative learning approach to the traveling salesman problem[J]. IEEE Transactions on evolutionary computation, 1997, 1(1):53-66.
[14] WANG Xuewu, TANG Bin, GU Xingsheng. Research on obstacle avoidance strategy for welding robot[J]. Journal of Mechanical Engineering, 2019, 55(17):77-84. 王学武, 汤彬, 顾幸生. 焊接机器人避障策略研究[J]. 机械工程学报, 2019, 55(17):77-84.
[15] WANG G, LIU X, ZHAO Y, et al. Neural network-based adaptive motion control for a mobile robot with unknown longitudinal slipping[J]. Chinese Journal of Mechanical Engineering, 2019, 32(1):1-9.
[16] CAI Y, WANG H, CHEN X, et al. Vehicle detection based on visual saliency and deep sparse convolution hierarchical model[J]. Chinese Journal of Mechanical Engineering, 2016, 29(4):765-772.
[17] CHAKOLE J B, KOLHE M S, MAHAPURUSH G D, et al. A Q-learning agent for automated trading in equity stock markets[J]. Expert Systems with Applications, 2021, 163:113761.
[18] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292.
[19] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]. Robotics and Autonomous Systems, 2019, 115:143-161.
[20] MARTHI B. Automatic shaping and decomposition of reward functions[C]//Proceedings of the 24th International Conference on Machine Learning. USA:ACM, 2007:601-608.
[21] BIANCHI R A C, RIBEIRO C H C, COSTA A H R. Accelerating autonomous learning by using heuristic selection of actions[J]. Journal of Heuristics, 2008, 14(2):135-168.
[22] YU Naigong, MO Fanfan. Mobile robot path planning based on deep auto-encoder and Q-learning[J]. Journal of Beijing University of Technology, 2016, 42(5):668-673. 于乃功, 默凡凡. 基于深度自动编码器与Q学习的移动机器人路径规划方法[J]. 北京工业大学学报, 2016, 42(5):668-673.
[23] CHANG Baoxian, DING Jie, ZHU Junwu, et al. Robot Q-learning coverage algorithm in unknown environments[J]. Journal of Nanjing University of Science and Technology, 2013, 37(6):792-812. 常宝娴, 丁洁, 朱俊武, 等. 未知环境下机器人Q学习覆盖算法[J]. 南京理工大学学报, 2013, 37(6):792-812.
[24] GAO Le, MA Tianlu, LIU Kai, et al. Application of improved Q-Learning algorithm in path planning[J]. Journal of Jilin University (Information Science Edition), 2019, 36(4):439-443. 高乐, 马天录, 刘凯, 等. 改进Q-Learning算法在路径规划中的应用[J]. 吉林大学学报(信息科学版), 2019, 36(4):439-443.
[25] LI Ting. Research of path planning algorithm based on reinforcement learning[D]. Jinlin:Jilin University, 2020. 李婷. 基于强化学习的路径规划算法研究[D]. 吉林:吉林大学, 2020。
[26] WANG Chengbo, ZHANG Xinyu, ZOU Zhiqiang, et al. On path planning of unmanned ship based on Q-Learning[J]. Ship & Ocean Engineering, 2018, 47(5):168-171. 王程博, 张新宇, 邹志强, 等. 基于Q-Learning的无人驾驶船舶路径规划[J]. 船海工程, 2018, 47(5):168-171.
[27] WEI Yuliang, JIN Wuyin. Intelligent vehicle path planning based on neural network Q-learning algorithm[J]. Fire Control & Command Control, 2019, 44(2):46-49. 卫玉梁, 靳伍银. 基于神经网络Q-learning算法的智能车路径规划[J]. 火力与指挥控制, 2019, 44(2):46-49.
[28] HU Yanming, LI Decai, HE Yuqing, et al. Q-Learning algorithm based on incremental RBF network[J]. Robot, 2019, 41(5):562-573. 胡艳明, 李德才, 何玉庆, 等. 基于增量式RBF网络的Q学习算法[J]. 机器人, 2019, 41(5):562-573.
[29] SHI Qun, LÜ Lei, XIE Jiajun. Intelligent posture control of humanoid robot in variable environment[J]. Journal of Mechanical Engineering, 2020, 56(3):64-72. 施群, 吕雷, 谢家骏. 可变环境下仿人机器人智能姿态控制[J]. 机械工程学报, 2020, 56(3):64-72.
[30] RUAN Xiaogang, LIU Pengfei, ZHU Xiaoqing. Q-Learning environment recognition method base in odor-reward shaping[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(3):254-260. 阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3):254-260.
[31] CUI Genqun, HU Kerun, TANG Fengmin. Intelligent vehicle path planning based on genetic algorithm and Bézier curve[J]. Modern Electronics Technique, 2021, 44(1):144-148. 崔根群, 胡可润, 唐风敏. 融合遗传贝塞尔曲线的智能汽车路径规划[J]. 现代电子技术, 2021, 44(1):144-148.
[32] DUAN Jianmin, CHEN Qianglong. Prior knowledge based Q-Learning path planning algorithm[J]. Electronics Optics & Control, 2019, 26(9):29-33. 段建民, 陈强龙. 利用先验知识的Q-Learning路径规划算法研究[J]. 电光与控制, 2019, 26(9):29-33.