Gait Switching Method for Humanoid Robot Integrating Vision-language Model and Proximal Policy Optimization Algorithm

doi:10.3901/JME.2025.21.204

Abstract

Abstract: Gait switching is the core of humanoid robots’ ability to achieve seamless locomotion across multiple terrains. Existing methods predominantly rely on proprioception and lack the ability to perceive external environmental features. To address this, a gait switching method is proposed by integrating the semantic mapping capabilities of vision-language models (VLMs) with the adaptive learning characteristics of the proximal policy optimization (PPO) algorithm. First, human-like gait sequences are generated through motion retargeting using a linear mapping. Then, a reward-shaped PPO algorithm trains gait primitives to construct a multi-terrain gait library. Next, a gait scheduler based on a VLM is designed to dynamically match suitable gait primitives. After that, polynomial functions are constructed via Lagrange interpolation to constrain joint trajectories, enabling smooth and adaptive gait transitions. Finally, experiments on autonomous gait switching in representative scenarios validate the effectiveness of the proposed method.

Key words: humanoid robot, gait switching, vision-language model, reinforcement learning

CLC Number:

TP242

DU Guofeng, SHAO Shibo, LI Shanglin, LIN Chengran, CAO Zhengcai. Gait Switching Method for Humanoid Robot Integrating Vision-language Model and Proximal Policy Optimization Algorithm[J]. Journal of Mechanical Engineering, 2025, 61(21): 204-212.

References

[1] WEI H，SHUAI M，WANG Z. Dynamically adapt to uneven terrain walking control for humanoid robot[J]. Chinese Journal of Mechanical Engineering，2012，25(2)：214-222.
[2] HANASAKI S，TAZAKI Y，NAGANO H，et al. Running trajectory generation including gait transition between walking based on the time-varying linear inverted pendulum mode[C]//IEEE-RAS International Conference on Humanoid Robots，Ginowan，Japan：IEEE，2022：851-857.
[3] ACOSTA B，POSA M. Bipedal walking on constrained footholds with MPC footstep control[C]//IEEE-RAS International Conference on Humanoid Robots，Austin，TX，USA：IEEE，2023：1-8.
[4] KRISHNA L，CASTILLO G A，MISHRA U A，et al. Linear policies are sufficient to realize robust bipedal walking on challenging terrains[J]. IEEE Robotics and Automation Letters，2022，7(2)：2047-2054.
[5] KRISHNA L，MISHRA U A，CASTILLO G A，et al. Learning linear policies for robust bipedal locomotion on terrains with varying slopes[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems，Prague，Czech Republic：IEEE，2021：5159-5164.
[6] CHAND P，VEER S，POULAKAKIS I. Interactive dynamic walking：Learning gait switching policies with generalization guarantees[J]. IEEE Robotics and Automation Letters，2022，7(2)：4149-4156.
[7] KOSEKI S，MOHSENI O，OWAKI D，et al. Concerted control：Modulating joint stiffness using GRF for gait generation at different speeds[J]. IEEE Robotics and Automation Letters，2025，10(4)：3446-3453.
[8] SIEKMANN J，GREEN K，WARILA J，et al. Blind bipedal stair traversal via sim-to-real reinforcement learning[C]//Robotics：Science and Systems，Berlin，Germany：Robotics：Science and Systems Foundation，2021：1-9.
[9] YU F，BATKE R，DAO J，et al. Dynamic bipedal turning through sim-to-real reinforcement learning[C]//IEEE- RAS International Conference on Humanoid Robots，Ginowan，Japan：IEEE，2022：903-910.
[10] WEI W，WANG Z，XIE A，et al. Learning gait- conditioned bipedal locomotion with motor adaptation[C]//IEEE-RAS International Conference on Humanoid Robots，Austin，TX，USA：IEEE，2023：1-7.
[11] YU S，PERERA N，MAREW D，et al. Learning generic and dynamic locomotion of humanoids across discrete terrains[C]//IEEE-RAS International Conference on Humanoid Robots，Nancy，France：IEEE，2024：1048-1055.
[12] VICECONTE P M，CAMORIANO R，ROMUALDI G，et al. ADHERENT：Learning human-like trajectory generators for whole-body control of humanoid robots[J]. IEEE Robotics and Automation Letters，2022，7(2)：2779-2786.
[13] TANG A，HIRAOKA T，HIRAOKA N，et al. Human mimic：Learning natural locomotion and transitions for humanoid robot via wasserstein adversarial imitation[C]//IEEE International Conference on Robotics and Automation，Yokohama，Japan：IEEE，2024：13107- 13114.
[14] HUANG H，CUI W，ZHANG T，et al. Think on your feet：Seamless transition between human-like locomotion in response to changing commands[EB/OL]. arXiv，2025. arXiv：2502.18901. https://arxiv.org/abs/2502.18901.
[15] MA L，MENG Z，LIU T，et al. StyleLoco：Generative adversarial distillation for natural humanoid robot locomotion[EB/OL]. arXiv，2025. arXiv：2503.15082. https://arxiv.org/abs/2503.15082.
[16] JIN F，WANG Y，MA P，et al. Teacher motion priors：Enhancing robot locomotion over challenging terrain[EB/OL]. arXiv，2025. arXiv：2504.10390. https://arxiv.org/abs/2504.10390.
[17] JIANG Z，XIE Y，LI J，et al. Harmon：Whole-body motion generation of humanoid robots from language descriptions[EB/OL]. arXiv，2024. arXiv：2410.12773. https://arxiv.org/abs/2410.12773.
[18] SUN J，ZHANG Q，HAN G，et al. Trinity：A modular humanoid robot AI system[EB/OL]. arXiv，2025. arXiv：2503.08338. https://arxiv.org/abs/2503.08338.
[19] YUAN H，BAI Y，FU Y，et al. Being-0：A humanoid robotic agent with vision-language models and modular skills[EB/OL]. arXiv，2025. arXiv：2503.12533. https://arxiv.org/abs/2503.12533.
[20] NG A Y，HARADA D，RUSSELL S J. Policy invariance under reward transformations：Theory and application to reward shaping[C]//Proceedings of the 16th International Conference on Machine Learning. San Francisco，CA，USA：Morgan Kaufmann Publishers Inc.，1999：278-287.
[21] REN T，JIANG Q，LIU S，et al. Grounding DINO 1.5：Advance the “Edge” of open-set object detection[EB/OL]. arXiv，2024. arXiv：2405.10300. https://arxiv.org/abs/2405.10300.
[22] RAVI N，GABEUR V，HU Y T，et al. SAM 2：Segment anything in images and videos[EB/OL]. arXiv，2024. arXiv：2408.00714. https://arxiv.org/abs/2408.00714.
[23] MITTAL M，YU C，YU Q，et al. Orbit：A unified simulation framework for interactive robot learning environments[J]. IEEE Robotics and Automation Letters，2023，8(6)：3740-3747.
[24] JOCHER G，QIU J. Ultralytics YOLO11(version 11.0.0) [EB/OL]. 2024[2025-02-28]. https://github.com/ultralytics/ultralytics.