• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2025, Vol. 61 ›› Issue (15): 285-296.doi: 10.3901/JME.2025.15.285

• 人因与具身智能 • 上一篇    

扫码分享

面向人-机-环境共融的具身增强现实拆解系统

吕健豪, 司佳辉, 鲍劲松   

  1. 东华大学机械工程学院 上海 201620
  • 收稿日期:2024-09-30 修回日期:2024-12-21 发布日期:2025-09-28
  • 作者简介:吕健豪,男,1995年出生,博士研究生。主要研究方向人机协作,增强现实与数字孪生。E-mail:c953749@163.com;鲍劲松(通信作者),男,1973年出生,博士,教授,博士研究生导师。主要研究方向为智能制造、虚拟现实/人机交互技术等。E-mail:bao@dhu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(52475513)。

Embodied Augmented Reality Disassembly System for Human-robot-environment Integration

Lü Jianhao, SI Jiahui, BAO Jingsong   

  1. College of Mechanical Engineering, Donghua University, Shanghai 201620
  • Received:2024-09-30 Revised:2024-12-21 Published:2025-09-28

摘要: 在人机协作拆解中,制造系统依赖预设算法的固定感知-认知模式,难以适配操作员基于经验的灵活需求与动态环境变化,易导致机器人的路径规划失效和决策停滞。对此,提出面人-机-环境共融的具身增强现实拆解系统。系统基于具身智能理论,以“感知-认知-执行”机制为核心,结合增强现实技术强化具身代理的环境感知与认知推理能力。设计一种具身增强现实协同拆解策略;提出基于上下文强化机制的局部图像注意力模型,实现自适应图像描述生成;设计基于大语言模型的调优和推理机制进行自优化的认知推理方法;设计基于增强现实的人-机-环境数据交互方法,实现机器人的操控执行。构建三种相似性指标用于进行具身感知-认知的性能评估。定量和定性的实验可证明系统的可行性和有效性。

关键词: 人机协同拆解, 具身智能, 增强现实, 图像描述, 大语言模型

Abstract: In human-robot collaborative disassembly, manufacturing systems predominantly rely on fixed perception-cognition paradigms governed by pre-established algorithms. This reliance poses significant challenges in accommodating the flexible requirements of operators, which are inherently informed by experience and the dynamic collaborative environment. As a result, robotic path planning often fails, and decision-making stalls. To address this, an embodied augmented reality disassembly system for human-robot-environment integration is proposed. The system is grounded in embodied intelligence theory and features a "perception-cognition-execution" mechanism. By combining this mechanism with augmented reality technology, it enhances environmental perception and cognitive reasoning. A collaborative disassembly strategy for embodied augmented reality is designed; a local image attention model with context enhanced mechanism to generate adaptive image captioning; a self-optimizing cognitive reasoning method is developed by large language model tuning and inference mechanism; a robotic manipulation method is developed through augmented reality-based human-robot-environment data interaction. Three similarity metrics are constructed to evaluate the performance of embodied perception and cognition. Quantitative and qualitative experiments demonstrate the system’s feasibility and effectiveness in enhancing human-robot collaborative disassembly efficiency and adaptability.

Key words: human-robot collaborative disassembly, embodied intelligence, augmented reality, image captioning, large language model

中图分类号: