Chain-of-thought Paradigm Text-based Multimodal Intelligent Agent for Equipment Operations and Maintenance

doi:10.3901/JME.2025.23.058

Abstract

Abstract: An artificial intelligence architecture—the chain-of-thought (CoT) paradigm text-based multimodal intelligent agent—for operation and maintenance (O&M) of mechanical equipment is proposed. Firstly, to address the challenge of constructing high-quality, large-scale monitoring data-to-fault mode mapping datasets in real-world engineering applications, a chain-of-thought dataset construction strategy integrating monitoring signals, mathematical features, text descriptions, and fault mode is proposed. Based on this, a signal-to-text (Sig2Txt) model driven by a signal-text data generator is developed. Subsequently, a high-quality specialized textual dataset for O&M in the mechanical equipment domain is created, and an intelligent O&M-specialized large language model is established through instruction fine-tuning on a general large language model. Finally, by organically integrating the above models based on large model intelligent agent technology and guided by the operational thinking patterns of human experts in equipment maintenance, a chain-of-thought paradigm text-based multimodal intelligent agent for intelligent O&M is formed. Testing results indicate that this model can achieve chain-of-thought parsing and mapping from multimodal input decision-making, with an accuracy exceeding 70% on ISO Category III vibration analyst test questions, thus reaching expert-level performance. In evaluations with engineering cases and publicly available multimodal datasets, the proposed model outperforms existing mainstream large models. More importantly, owing to the proposed low-cost, high-quality multimodal CoT large-scale dataset construction framework, and the unique advantages of a “text-based” approach in terms of encompassing knowledge, high-level abstraction, and interpretability, the model shows considerable scalability and development potential.

Key words: intelligent operation and maintenance, multimodal model, large language model, fault diagnosis, intelligent agent

CLC Number:

TH17

HUANG Jinfeng, WANG Chengcheng, HE Hongliang, WANG Xu, LI Qi, YANG Kangding, WANG Kai, ZHANG Feibin, QIN Zhaoye, CHU Fulei. Chain-of-thought Paradigm Text-based Multimodal Intelligent Agent for Equipment Operations and Maintenance[J]. Journal of Mechanical Engineering, 2025, 61(23): 58-74.

References

[1] 臧冀原,刘宇飞,王柏村,等. 面向2035的智能制造技术预见和路线图研究[J]. 机械工程学报,2022,58(4):285-308. ZANG Jiyuan,LIU Yufei,WANG Baicun,et al. Technology forecasting and roadmapping of intelligent manufacturing by 2035[J]. Journal of Mechanical Engineering,2022,58(4):285-308.
[2] 汪俊亮,高鹏捷,张洁,等. 制造大数据分析综述:内涵、方法、应用和趋势[J]. 机械工程学报,2023,59(12):1-16.WANG Junliang,GAO Pengjie,ZHANG Jie,et al. A review of manufacturing big data:connotation,methodology,application and trends[J]. Journal of Mechanical Engineering,2023,59(12):1-16.
[3] TAUHEED M,ANURAG C,SHAHAB F. Multi-sensor fault diagnosis for misalignment and unbalance detection using machine learning[J]. IEEE Transactions on Industry Applications,2023,59(5):5749-59.
[4] LEI Yaguo,YANG Bin,JIANG Xinwei,et al. Applications of machine learning to machine fault diagnosis:A review and roadmap[J]. Mechanical Systems and Signal Processing,2020,138:106587.
[5] CHEN Jiaxian,HUANG Ruyi,CHEN Zhuyun,et al. Transfer learning algorithms for bearing remaining useful life prediction:A comprehensive review from an industrial application perspective[J]. Mechanical Systems and Signal Processing,2023,193:110239.
[6] 黄包裕,张永祥.一种基于三步法的轴承故障诊断方法[J]. 振动工程学报,2024(14):51-68. HUANG Baoyu,ZHANG Yongxiang. Rolling element bearing fault diagnosis using a three-step scheme[J]. Journal of Mechanical Engineering,2024(14):51-68.
[7] 任杉,王晋,赵欣,等. 复杂产品智能设计与主动运维"双馈式"制造服务方法体系[J]. 机械工程学报,2024,60(06):127-36. REN Shan,WANG Jin,ZHAO Xin,et al. "Doubly-fed" manufacturing service of intelligent design and preventive maintenance for complex products[J]. Journal of Mechanical Engineering,2024,60(06):127-36.
[8] ZHENG Zhe,WANG Fei,GONG Guofang,et al. Intelligent technologies for construction machinery using data-driven methods[J]. Automation in Construction,2023,147:104711.
[9] DENG J,DONG W,SOCHER R,et al. Imagenet:A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE,2009:248-255.
[10] FANG A,KORNBLITH S,SCHMIDT L. Does progress on ImageNet transfer to real-world datasets[J]. Advances in Neural Information Processing Systems,2024,36:25050-25080.
[11] BROWN T B. Language models are few-shot learners[J]. Arxiv,Preprint Arxiv:2005.14165,2020.
[12] WU T,HE S,LIU J,et al. A brief overview of ChatGPT:The history,status quo and potential future development[J]. IEEE/CAA Journal of Automatica Sinica,2023,10(5):1122-1136.
[13] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al. Mastering the game of go without human knowledge[J]. Nature,2017,550(7676):354-359.
[14] MIAO Q,ZHENG W,LV Y,et al. DAO to HANOI via DeSci:AI paradigm shifts from AlphaGo to ChatGPT[J]. IEEE/CAA Journal of Automatica Sinica,2023,10(04):877-897.
[15] CHIB P,SINGH P. Recent advancements in end-to-end autonomous driving using deep learning:A survey[J]. IEEE Transactions on Intelligent Vehicles,2023.
[16] HAYDARI A,YILMAZ Y. Deep reinforcement learning for intelligent transportation systems:A survey[J]. IEEE Transactions on Intelligent Transportation Systems,2020,23(1):11-32.
[17] FEDUS W,ZOPH B,and SHAZEER N,Switch trans formers:Scaling to trillion parameter models with simple and efficient sparsity[J]. Journal of Machine Learning Research,2022,23(120):1-40.
[18] RADFORD A,WU J,CHILD R,et al.,Language models are unsuper vised multitask learners[R/OL]. San Francisco:OpenAI,2019-02. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
[19] BROWN T. Language models are few-shot learners[J]. Arxiv,Preprint Arxiv:2005.14165,2020.
[20] OpenAI. GPT-4 Technical Report[EB/OL]. San Francisco:OpenAI,2023-03-27[2025-08-04]. https://cdn.openai.com/papers/gpt-4.pdf.
[21] ZENG A,LIU X,DU Z,et al. Glm-130b:An open bilingual pre-trained model[J]. Arxiv,Preprint Arxiv:2210.02414,2022.
[22] WEI J,TAY Y,BOMMASANI R,et al. Emergent abilities of large language models[J]. Arxiv,Preprint Arxiv,2206.07682,2022.
[23] BUBECK S,CHANDRASEKARAN V,ELDAN R,et al. Sparks of artificial general intelligence:Early experiments with gpt-4[J]. Arxiv,Preprint Arxiv:2303.12712,2023.
[24] VASWANI A,SHAZEER N,et al. Attention is all you need[J]. Arxiv,Preprint Arxiv:1706.03762,2023.
[25] HUANG Jinfeng,ZHANG Feibin,SAFAEI B,et al. The flexible tensor singular value decomposition and its applications in multisensor signal fusion processing[J]. Mechanical Systems and Signal Processing,2024,220:111662.
[26] HUANG Jinfeng,ZHANG Feibin,Coombs T,et al. The first-kind flexible tensor SVD:innovations in multi-sensor data fusion processing[J]. Nonlinear Dynamics,2025,113:6541-6559.
[27] 张贤达. 现代信号处理[M]. 清华大学出版社有限公司,2015. ZHANG Xianda. Moderm signal processing[M]. Beijing:Press of Tsinghua University,2015.
[28] FENG Z,ZUO M. Vibration signal models for fault diagnosis of planetary gearboxes[J]. Journal of Sound and Vibration,2012,331(22):4919-4939.
[29] Jorj X. McKie. PyMuPDF:A high performance Python library for data extraction,analysis,conversion & manipulation of PDF (and other) documents[EB/OL]. 2023- 05. https://github.com/pymupdf/PyMuPDF.
[30] VASILIEV Y. Natural language processing with Python and spaCy:A practical introduction[M]. No Starch Press,2020.
[31] NAYAK N,NAN Y,TROST A,et al. Learning to generate instruction tuning datasets for zero-shot task adaptation[J]. Arxiv,Preprint Arxiv:2402.18334,2024.
[32] International Organization for Standardization. ISO 18436‑2:2014 Condition monitoring and diagnostics of machines-Requirements for qualification and assessment of personnel-Part 2:Vibration condition monitoring and diagnostics[S]. Geneva:ISO,2014.
[33] LI Junan,LI Dongxu,SAVARESE S,et al. BLIP-2:bootstrapping language-image pretraining with frozen image encoders and large language models[C]//International Conference on Machine Learning,2023,19730-19742.
[34] ZHE Yutong ZHANG Rui,ZHANG Jie,et al. Llamafactory:Unified efficient fine-tuning of 100+ language models[J]. Arxiv,Preprint Arxiv:2403.13372,2024.
[35] GLM T,ZENG A,XU B,et al. Chatglm:A family of large language models from glm-130b to glm-4 all tools[J]. Arxiv,Preprint Arxiv:2406.12793,2024.
[36] HU E,SHEN Y,WALLIS P,et al. Lora:Low-rank adaptation of large language models[J]. Arxiv,Preprint Arxiv:2106.09685,2021.
[37] LOSHCHILOV I,HUTTER F. Sgdr:Stochastic gradient descent with warm restarts[J]. Arxiv,Preprint Arxiv:1608.03983,2016.
[38] ZHAO Xiaoli. Intelligent fault diagnosis of gearbox under variable working conditions with adaptive intraclass and interclass convolutional neural network[J]. IEEE Transactions on Neural Networks and Learning Systems,2023,34(9):6339-6353.
[39] XU Haifeng,WANG Xu,HUANG Jinfeng,et al. Semi-supervised multi-sensor information fusion tailored graph embedded low-rank tensor learning machine under extremely low labeled rate[J]. Information Fusion,2024,105:102222.
[40] YEUNG Y,PAUL-AJUWAPE A,TAHIRY F,et al. RoSA:A mechatronically synthesized dataset for rotodynamic system anomaly detection[J]. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems,2022.
[41] HENG A,ZHANG Sheng,TAN A,et al. Rotating machinery prognostics:State of the art,challenges and opportunities[J]. Mechanical Systems and Signal Processing,2009,23:724-739.
[42] 王国彪,何正嘉,等,机械故障诊断基础研究"何去何从"[J]. 机械工程学报,2013,49(1):63-72. WANG Guobiao,HE Zhengjia,et al. Basic research on machinery fault diagnosis-what is the prescription[J]. Journal of Mechanical Engineering,2013,49(1):63-72.
[43] WEI Jason,WANG Xuezhi,SCHUURMANS Dale,et al. Chain-of-thought prompting elicits reasoning in large language models[J]. Advances in Neural Information Processing Systems,2022,35:24824-24837.
[44] GUO T,CHEN X,WANG Y,et al. Large language model based multi-agents:A survey of progress and challenges[J]. Arxiv,Preprint Arxiv:2402.01680,2024.
[45] WENG Lilian. Llm powered autonomous agents[EB/OL]. 2023-06. https://lilianweng.github.io/posts/2023-06-23-agent/.
[46] OSNIOsni. Introducing GPT-4o:the fastest and most powerful AI yet[EB/OL]. 2024-05. https://www.swiftask.ai/blog/gpt-4o?utm_source=chatgpt.com
[47] GLM T,ZENG A,XU B,et al. Chatglm:A family of large language models from glm-130b to glm-4 all tools[J]. Arxiv,Preprint Arxiv:2406.12793,2024.
[48] QWEN Team. Qwen2.5-LLM:Extending the boundary of LLMs[EB/OL]. 2024-09. https://qwenlm.github.io/blog/qwen2.5-llm/.
[49] SMITH W A,RANDALL R B. Rolling element bearing diagnostics using the Case Western Reserve University data:A benchmark study[J]. Mechanical Systems and Signal Processing,2015,64:100-131.
[50] LIU Dong,XIAO Zhihuai,HU Xiao,et al. Feature extraction of rotor fault based on EEMD and curve code[J]. Measurement,2019,135:712-724.
[51] MARINS M,RIBEIRO F,NETTO S,et al. Improved similarity-based modeling for the classification of rotating-machine failures[J]. Journal of the Franklin Institute,2018,355(4):1913-1930.
[52] TEAM Q. Qwq-32b:Embracing the power of reinforcement learning[EB/OL]. 2025-03. https://qwenlm.github.io/blog/qwq-32b.
[53] Artificial analysis. QwQ-32B-intelligence,performance & price analysis[EB/OL]. 2025-03. https://artificialanalysis.ai/models/qwq-32b?utm_source=chatgpt.com.
[54] OpenAI. O1 and new tools for developers[EB/OL]. 2024-12. https://openai.com/index/o1-and-new-tools-for-developers/?utm_source=chatgpt.com.