基于深度强化学习的四足机器人跟随策略研究及系统实现

doi:10.3901/JME.2023.13.079

机械工程学报 ›› 2023, Vol. 59 ›› Issue (13): 79-88.doi: 10.3901/JME.2023.13.079

扫码分享

基于深度强化学习的四足机器人跟随策略研究及系统实现

钟沛成, 骆德渊, 庞明君

电子科技大学机械与电气工程学院成都 611731

收稿日期:2022-08-26 修回日期:2023-05-17 出版日期:2023-07-05 发布日期:2023-08-15
通讯作者: 骆德渊(通信作者),男,1970年出生,博士,教授,硕士研究生导师。主要研究方向为智能机器人技术。E-mail:luodeyuan@163.com
作者简介:钟沛成,男,1997年出生。主要研究方向为机器人环境感知与自主决策。E-mail:zhongpc2019@163.com

Research and System Implementation of Quadruped Robot Following Strategy Based on Deep Reinforcement Learning

ZHONG Peicheng, LUO Deyuan, PANG Mingjun

School of Mechanical and Electrical Engineering, University of Electrical Science and Technology of China, Chengdu 611731

Received:2022-08-26 Revised:2023-05-17 Online:2023-07-05 Published:2023-08-15

摘要/Abstract

摘要： 目标跟随策略是四足机器人目标跟随系统的重要组成部分。针对跟随过程中目标运动随机因素多、系统决策复杂以及现实部署鲁棒性不足的问题，首先，提出一种基于深度强化学习的目标跟随策略，该策略根据输入的目标相对于机器人的空间位置信息，输出跟随动作指令，实现机器人对随机运动目标的跟随决策。然后，使用基于Actor-Critic框架的深度强化学习算法对机器人进行训练，并添加观测值噪声以获得更鲁棒的跟随策略和引入修正因子来减少仿真环境与真实环境中机器人的运动速度偏差，先在仿真平台上进行了初步验证，最后将跟随策略部署到四足机器人上进行实验验证。结果表明，系统跟随性能良好，满足大多数应用场景的需要。

关键词: 四足机器人, 目标跟随系统, 深度强化学习, 目标跟随策略

Abstract: The target following strategy is an important part of the target following system of the quadruped robot. Aiming at the problems of many random factors of target motion, complex system decision-making and insufficient robustness of real-world deployment in the following process, firstly, a target following strategy based on deep reinforcement learning is proposed, which is based on the spatial position information of the input target relative to the robot, output the following action command to realize the robot's following decision to the random moving target. Then, the robot is trained using a deep reinforcement learning algorithm based on the Actor-Critic framework, and observation noise is added to obtain a more robust following strategy and a correction factor is introduced to reduce the speed deviation of the robot in the simulated environment and the real environment. The initial verification is carried out on the simulation platform, and finally the following strategy is deployed on the quadruped robot for experimental verification. The results show that the system has good following performance and meets the needs of most application scenarios.

Key words: quadruped, object following system, deep reinforcement learning, object following strategy

中图分类号:

TG156

钟沛成, 骆德渊, 庞明君. 基于深度强化学习的四足机器人跟随策略研究及系统实现[J]. 机械工程学报, 2023, 59(13): 79-88.

ZHONG Peicheng, LUO Deyuan, PANG Mingjun. Research and System Implementation of Quadruped Robot Following Strategy Based on Deep Reinforcement Learning[J]. Journal of Mechanical Engineering, 2023, 59(13): 79-88.

参考文献

[1] RAIBERT M,BLANKESPOOR K,NELSON G,et al. BigDog,the rough-terrain quadruped robot[C]//Proceedings of the 17th World Congress. The International Federation of Automatic Control,Seoul:IFAC,2008:10822-10825.
[2] HUTTER M,GEHRING C,JUD D,et al. ANYmal-A highly mobile and dynamic quadruped robot[C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),Daejeon:IEEE,2016:38-44.
[3] HUTTER M,GEHRING C,BLOESCH M,et al. StarlETH:A compliant quadrupedal robot for fast,efficient,and versatile locomotion[C]//International Conference on Climbing and Walking Robots (CLAWAR). Adaptive Mobile Robotics,2012:483-490.
[4] KIM D,CARLO J D,KATZ B,et al. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control[J/OL]. arXiv.org,2019.
[5] BLEDT G,POWELL M J,KATZ B,et al. MIT Cheetah 3:Design and control of a robust,dynamic quadruped robot[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). IEEE,2018:2245-2252.
[6] PERDOCH M,BRADLEY D M,CHANG J K,et al. Leader tracking for a walking logistics robot[C]//2015 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),Hamburg:IEEE,2015:2994-3001.
[7] PANG L,CAO Zhiqiang,YU Junzhi,et al. A visual leader-following approach with a T-D-R framework for quadruped robots[J]. IEEE Transactions on Systems,Man and Cybernetics:Systems,2019,51(4):2342-2354.
[8] PARK J H,RYU H,CHOI Y H,et al. User following strategy for mobile robots[C]//International Conference on Ubiquitous Robots & Ambient Intelligence,Jeju:IEEE,2013:717-719.
[9] JIA S,WANG L,SHUANG W,et al. Fuzzy-based intelligent control strategy for a person following robot[C]//2013 IEEE International Conference on Robotics and Biomimetics (ROBIO),Shenzhen:IEEE,2013:2408-2413.
[10] CHOU K Y,CHEN Y T,LIN J K,et al. Q-learning based tracking control and slope climbing strategy design of autonomous mobile robot and flatbed vehicle[C]//2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW),Penghu:IEEE,2021:1-2.
[11] CHANDRA K D,JUN M. Integrating multiple policies for person-following robot training using deep reinforcement learning[J]. IEEE Access,2021,9:75526-75541.
[12] LUO W,SUN P,ZHONG F,et al. End-to-end active object tracking and its real-world deployment via reinforcement learning[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence,2019,42(6):1317-1332.
[13] LI Z,LI B,LIANG Q,et al. Research and realization of target following and autonomous obstacle avoidance algorithm of quadruped robot[C]//2021 40th Chinese Control Conference(CCC),Shanghai:IEEE,2021:3984-3989.
[14] ZHANG H,LIU H,DENG L,et al. Leader recognition and tracking for quadruped robots[C]//2018 IEEE International Conference on Information and Automation (ICIA),Wuyishan:IEEE,2018:1438-1443.
[15] ZHANG Z,YAN J,KONG X,et al. Efficient motion planning based on kinodynamic model for quadruped robots following persons in confined spaces[J]. IEEE/ASME Transactions on Mechatronics,2021,26(4):1997-2006.
[16] REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:Unified,real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),Las Vegas:IEEE,2016:779-788.
[17] BOCHKOVSKIY A,WANG C Y,LIAO H. YOLOv4:Optimal speed and accuracy of object detection[J]. arXiv,2020,2004:10934.
[18] WOJKE N,BEWLEY A,PAULUS D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing(ICIP),Beijing:IEEE,2017:3645-3649.
[19] BEWLEY A,GE Z,OTT L,et al. Simple online and realtime tracking[C]//2016 IEEE International Conference on Image Processing(ICIP),Phoenix:IEEE,2016:3464-3468.
[20] SCHULMAN J,WOLSKI F,DHARIWAL P,et al. Proximal policy optimization algorithms[J]. arXiv.org,2017.
[21] SCHULMAN J,LEVINE S,MORITZ P,et al. Trust region policy optimization[C]//Proceedings of the 31th International Conference on Machine Learning,Lille:JMLR:W&CP,2015:1889-1897.

基于深度强化学习的四足机器人跟随策略研究及系统实现

Research and System Implementation of Quadruped Robot Following Strategy Based on Deep Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王无印, 黄子钊, 庄子龙, 方怀瑾, 秦威. 基于深度强化学习的自动化码头堆场场桥调度方法[J]. 机械工程学报, 2024, 60(6): 44-57.
[2]	纵怀志, 艾吉昆, 张军辉, 江磊, 谭树杰, 刘余贤, 苏琦, 徐兵. 基于拓扑优化和晶格填充的四足机器人肢腿单元轻量化设计[J]. 机械工程学报, 2024, 60(4): 420-429.
[3]	郭景华, 李文昌, 王班, 王靖瑶. 基于深度强化学习的网联混合动力汽车队列控制[J]. 机械工程学报, 2024, 60(2): 262-271.
[4]	陈睿奇, 黎雯馨, 王传洋, 杨宏兵. 基于深度强化学习的工序交互式智能体Job shop调度方法[J]. 机械工程学报, 2023, 59(12): 78-88.
[5]	唐小林, 陈佳信, 高博麟, 杨凯, 胡晓松, 李克强. 基于云控系统高精度地图驱动的深度强化学习型混合动力汽车集成控制[J]. 机械工程学报, 2022, 58(24): 163-177.
[6]	唐鑫, 欧阳权, 黄俍卉, 王志胜, 马瑞. 基于深度强化学习的锂电池快速充电控制策略[J]. 机械工程学报, 2022, 58(22): 69-78.
[7]	王辉, 徐佳文, 严如强. 基于多尺度注意力深度强化学习网络的行星齿轮箱智能诊断方法[J]. 机械工程学报, 2022, 58(11): 133-142.
[8]	唐小林, 陈佳信, 刘腾, 李佳承, 胡晓松. 基于深度强化学习的混合动力汽车智能跟车控制与能量管理策略研究[J]. 机械工程学报, 2021, 57(22): 237-246.
[9]	陈超逸, 鲁娟, 陈楷, 黎宇嘉, 马俊燕, 廖小平. 车削表面粗糙度解析模型与DDQN-SVR预测模型研究[J]. 机械工程学报, 2021, 57(13): 262-272.
[10]	施群, 吕雷, 谢家骏. 可变环境下仿人机器人智能姿态控制[J]. 机械工程学报, 2020, 56(3): 64-72.
[11]	郭鹏, 张新艳, 余建波. 基于深度强化学习与有限元仿真集成的拉深成形控制[J]. 机械工程学报, 2020, 56(20): 47-58.
[12]	周坤, 李川, 李超, 朱秋国. 面向未知复杂地形的四足机器人运动规划方法[J]. 机械工程学报, 2020, 56(2): 210-219.
[13]	刘冬琛, 王军政, 汪首坤, 彭辉. 基于速度矢量的电动并联式轮足机器人全方位步态切换方法[J]. 机械工程学报, 2019, 55(1): 17-24.
[14]	巴凯先, 孔祥东, 朱琦歆, 李春贺, 赵华龙, 俞滨. 液压驱动单元基于位置/力的阻抗控制机理分析与试验研究[J]. 机械工程学报, 2017, 53(12): 172-185.
[15]	田兴华;高峰;陈先宝;齐臣坤. 四足仿生机器人混联腿构型设计及比较[J]. , 2013, 49(6): 81-88.