• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2025, Vol. 61 ›› Issue (23): 58-74.doi: 10.3901/JME.2025.23.058

• 机械动力学 • 上一篇    下一篇

扫码分享

思维链范式智能运维文本基多模态智能体

黄金凤1, 王成城2, 何宏亮1,3, 王旭4, 李奇1, 杨康定3, 王凯2, 张飞斌1, 秦朝烨1, 褚福磊1   

  1. 1. 清华大学高端装备界面科学与技术全国重点实验室 北京 100084;
    2. 机械工业仪器仪表综合技术经济研究所 北京 100055;
    3. 频率探索智能科技江苏有限公司 常州 213162;
    4. 智能博弈与决策实验室 北京 100091
  • 收稿日期:2025-01-09 修回日期:2025-09-06 出版日期:2025-12-05 发布日期:2026-01-22
  • 作者简介:黄金凤,女,1991年出生,博士。主要研究方向为机械故障诊断及智能运维。E-mail:hjinfeng1991@163.com
    张飞斌(通信作者),男,1986年出生,博士,助理研究员。主要研究方向为机械动力学、信号处理和智能运维。E-mail:zfeibin@mail.tsinghua.edu.cn
    褚福磊,男,1959 年出生,博士,教授,博士研究生导师。主要研究方向为大型旋转机械的动力学分析和故障诊断。E-mail:chufl@mail.tsinghua.edu.cn;E-mail:chufl@mail.tsinghua.edu.cn
  • 基金资助:
    国家自然科学基金(52305115, 52105109, 52161135101);中国博士后科学基金(2023M741938)资助项目

Chain-of-thought Paradigm Text-based Multimodal Intelligent Agent for Equipment Operations and Maintenance

HUANG Jinfeng1, WANG Chengcheng2, HE Hongliang1,3, WANG Xu4, LI Qi1, YANG Kangding3, WANG Kai2, ZHANG Feibin1, QIN Zhaoye1, CHU Fulei1   

  1. 1. Department of Mechanical Engineering, Tsinghua University, Beijing 100084;
    2. Standards and Testing Center, Instrumentation Technology and Economy Institute, Beijing 100055;
    3. FreqX Intelligence Technology Co., Ltd., Changzhou 213162;
    4. Intelligent Game and Decision Lab, Beijing 100091
  • Received:2025-01-09 Revised:2025-09-06 Online:2025-12-05 Published:2026-01-22

摘要: 提出了一种面向机械设备运维的人工智能架构——思维链范式文本基多模态智能体。首先,针对实际工程应用场景中,难以直接构建“监测数据-故障模式”高质量大规模数据集的问题,建立了“监测信号-数学特征-文本描述-故障模式”思维链式数据集构建思想,进而提出了信号-文本数据生成器驱动的信号转文本(Sig2Txt)模型。其次,制作了机械设备运维领域的高质量专业化文本数据集,并基于指令微调技术和通用大语言模型建立了智能运维专业文本大模型。最后,基于大模型智能体技术有机融合上述模型,以人类专家开展设备运维的思维模式为指引,创建了思维链范式智能运维文本基多模态智能体。测试结果表明,该智能体可实现多模态输入数据-运维决策的全流程思维链式解析和映射,在ISO三级振动分析师测试题上的准确率超过70%,达到人类专家级水平。在工程案例和公开数据组成的多模态数据集“热启动”评测中,相比于现有主流大模型,该模型展现出了更优异的性能。更重要的是,得益于所提出的低成本、高质量多模态思维链式大规模数据集的构建体系,以及“文本基”对于知识的包罗万象、高度凝练和易于解释的天然独特优势,该模型具备可观的拓展性和发展潜力。

关键词: 智能运维, 多模态模型, 大语言模型, 故障诊断, 智能体

Abstract: An artificial intelligence architecture—the chain-of-thought (CoT) paradigm text-based multimodal intelligent agent—for operation and maintenance (O&M) of mechanical equipment is proposed. Firstly, to address the challenge of constructing high-quality, large-scale monitoring data-to-fault mode mapping datasets in real-world engineering applications, a chain-of-thought dataset construction strategy integrating monitoring signals, mathematical features, text descriptions, and fault mode is proposed. Based on this, a signal-to-text (Sig2Txt) model driven by a signal-text data generator is developed. Subsequently, a high-quality specialized textual dataset for O&M in the mechanical equipment domain is created, and an intelligent O&M-specialized large language model is established through instruction fine-tuning on a general large language model. Finally, by organically integrating the above models based on large model intelligent agent technology and guided by the operational thinking patterns of human experts in equipment maintenance, a chain-of-thought paradigm text-based multimodal intelligent agent for intelligent O&M is formed. Testing results indicate that this model can achieve chain-of-thought parsing and mapping from multimodal input decision-making, with an accuracy exceeding 70% on ISO Category III vibration analyst test questions, thus reaching expert-level performance. In evaluations with engineering cases and publicly available multimodal datasets, the proposed model outperforms existing mainstream large models. More importantly, owing to the proposed low-cost, high-quality multimodal CoT large-scale dataset construction framework, and the unique advantages of a “text-based” approach in terms of encompassing knowledge, high-level abstraction, and interpretability, the model shows considerable scalability and development potential.

Key words: intelligent operation and maintenance, multimodal model, large language model, fault diagnosis, intelligent agent

中图分类号: