• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2025, Vol. 61 ›› Issue (21): 204-212.doi: 10.3901/JME.2025.21.204

• 特邀专栏:纪念张启先院士诞辰 100 周年 • 上一篇    

扫码分享

融合视觉语言模型与近端策略优化算法的人形机器人步态切换方法

杜国锋1, 邵士博1, 李尚霖1, 林诚然2, 曹政才2   

  1. 1. 北京化工大学信息科学与技术学院 北京 100029;
    2. 哈尔滨工业大学机器人技术与系统全国重点实验室 哈尔滨 150006
  • 收稿日期:2025-02-28 修回日期:2025-06-23 发布日期:2025-12-27
  • 作者简介:杜国锋,男,1998年出生,博士研究生。主要研究方向为人形机器人运动控制。E-mail:2023400216@buct.edu.cn
    曹政才(通信作者),男,1974年出生,博士,教授,博士研究生导师。主要研究方向为人工智能算法、机器人具身智能和人形机器人。E-mail:caozc@hit.edu.cn
  • 基金资助:
    北京市自然科学基金小米创新联合基金(L223019, L243004)、北京市自然科学基金(3242011)和黑龙江省自然科学基金(ZD2024E003)资助 项目。

Gait Switching Method for Humanoid Robot Integrating Vision-language Model and Proximal Policy Optimization Algorithm

DU Guofeng1, SHAO Shibo1, LI Shanglin1, LIN Chengran2, CAO Zhengcai2   

  1. 1. College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029;
    2. State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150006
  • Received:2025-02-28 Revised:2025-06-23 Published:2025-12-27

摘要: 步态切换是人形机器人实现多地形连续运动的核心。现有方法大多依赖本体感知,缺乏对外部环境特征的理解能力。为此,借鉴视觉语言模型的语义映射机制与近端策略优化算法的自适应学习特性,研究二者融合的人形机器人步态切换方法。首先,通过线性映射方法进行运动重定向,生成仿人步态序列;其次,基于奖赏塑形的近端策略优化算法训练步态基元,构建多地形步态库;然后,设计基于视觉语言模型的步态调度器,动态匹配步态基元;再次,采用拉格朗日插值方法构造多项式函数约束关节轨迹,实现适应性步态切换。最后,在典型场景下进行步态自主切换实验,验证了所提方法的有效性。

关键词: 人形机器人, 步态切换, 视觉语言模型, 强化学习

Abstract: Gait switching is the core of humanoid robots’ ability to achieve seamless locomotion across multiple terrains. Existing methods predominantly rely on proprioception and lack the ability to perceive external environmental features. To address this, a gait switching method is proposed by integrating the semantic mapping capabilities of vision-language models (VLMs) with the adaptive learning characteristics of the proximal policy optimization (PPO) algorithm. First, human-like gait sequences are generated through motion retargeting using a linear mapping. Then, a reward-shaped PPO algorithm trains gait primitives to construct a multi-terrain gait library. Next, a gait scheduler based on a VLM is designed to dynamically match suitable gait primitives. After that, polynomial functions are constructed via Lagrange interpolation to constrain joint trajectories, enabling smooth and adaptive gait transitions. Finally, experiments on autonomous gait switching in representative scenarios validate the effectiveness of the proposed method.

Key words: humanoid robot, gait switching, vision-language model, reinforcement learning

中图分类号: