基于多目标深度强化学习的不确定扰动下岛式装配线动态调度优化

doi:10.3901/JME.260229

机械工程学报 ›› 2026, Vol. 62 ›› Issue (5): 74-87.doi: 10.3901/JME.260229

• 特邀专栏：信息驱动的总装拉动生产模式、技术及应用 • 上一篇

扫码分享

基于多目标深度强化学习的不确定扰动下岛式装配线动态调度优化

黄铭¹, 黄思翰^1,2, 陈建鹏¹, 董威¹, 王柏村³, 阮兵⁴, 高云鹏⁵, 王国新^1,2, 阎艳^1,2

1. 北京理工大学机械与车辆学院北京 100081;
2. 北京理工大学工业知识与数据融合应用工业和信息化部重点实验室北京 100081;
3. 浙江大学机械工程学院杭州 310058;
4. 中国汽车工业工程有限公司天津 300113;
5. 国机智能技术研究院有限公司北京 100013

收稿日期:2025-02-25 修回日期:2025-07-15 发布日期:2026-04-23
作者简介:黄铭,男,1999年出生,博士研究生。主要研究方向为智能制造系统调度与动态优化。E-mail:huangming@bit.edu.cn
黄思翰(通信作者),男,1991年出生,博士,特聘研究员,博士研究生导师。主要研究方向为具身智能可重构制造、人体智造、数字孪生管控优化。E-mail:hsh@bit.edu.cn
陈建鹏,男,2000年出生,硕士研究生。主要研究方向为人机协同作业。E-mail:2393225727@qq.com
董威,男,2002年出生,硕士研究生。主要研究方向为智能制造系统智能仿真与优化。E-mail:1635611987@qq.com
基金资助:
国家重点研发计划(2024YFB3309801)、国家自然科学基金(52405530)、北京市自然科学基金重点研究专题(L243009)和北京理工大学青年教师学启动计划资助项目。

Dynamic Scheduling Optimization of Island Assembly Lines Under Uncertain Disturbances by Multi-objective Deep Reinforcement Learning

HUANG Ming¹, HUANG Sihan^1,2, CHEN Jianpeng¹, DONG Wei¹, WANG Baicun³, RUAN Bing⁴, GAO Yunpeng⁵, WANG Guoxin^1,2, YAN Yan^1,2

1. School of Mechanical Engineering, Beijing Institute of Technology, Beijing 10081;
2. Key Laboratory of Industry Knowledge & Data Fusion Technology and Application, Ministry of Industry and Information Technology, Beijing Institute of Technology, Beijing 100081;
3. School of Mechanical Engineering, Zhejiang University, Hangzhou 310058;
4. Automotive Engineering Corporation, Tianjin 300113;
5. SINOMACH Intelligence Technology Research Institute Co., Ltd., Beijing 100013

Received:2025-02-25 Revised:2025-07-15 Published:2026-04-23

摘要/Abstract

摘要： 随着新能源汽车产业的快速发展以及市场需求多样化、定制化趋势的兴起，一种新兴的岛式装配模式应运而生，用于解决传统汽车装配流水线柔性不足的问题。此外，在实际装配环境中以紧急插单为典型代表的不确定事件频发，严重制约着汽车总装的稳定性和生产效率。因此，结合实际需要，开展不确定扰动下岛式装配线动态调度优化。首先，构建形成以最小化最大完工时间和插单变更指数为双优化目标的混合整数非线性规划模型。其次，设计了一种多目标对决双层深度Q网络(MO-D3QN)求解该模型。其中针对岛式装配场景中装配岛、装配工艺、装配产品以及生产运输环节的特征设计了状态指标和动作调度规则。面向双优化目标分别构建了连续型即时奖励函数组件，并采用加权和标量化方法实现奖励聚合。然后，通过对MO-D3QN算法模型的学习训练，实现在不同环境状态下最佳调度规则的选择。最后，在3种规模的实例上开展计算试验，结果表明MO-D3QN优于单一调度规则、随机选择策略和经典DQN，从而验证了MO-D3QN的有效性和竞争力。

关键词: 岛式装配线, 汽车总装, 不确定扰动, 动态调度, 多目标深度强化学习

Abstract: With the rapid development of the new energy vehicle industry and the rise of diversified market demand and customization trends, an emerging island assembly mode has been introduced to address the lack of flexibility in the traditional automotive assembly line. Moreover, the frequent occurrence of uncertain events, such as emergency order insertion, severely restricts the stability and productivity of automotive final assembly in the actual assembly environment. Therefore, based on practical needs, dynamic scheduling optimization of island assembly lines under uncertain disturbances is conducted. First, a mixed-integer nonlinear programming model is formulated with the dual objectives of minimizing the maximum completion time and the order change index. Secondly, a multi-objective dueling double deep Q-network (MO-D3QN) is designed to solve this model. In this framework, state indicators and action scheduling rules are developed based on the features of assembly islands, assembly processes, assembly products, and production transportations in the island assembly scenario. Continuous immediate reward function components are constructed separately for dual optimization objectives, and reward aggregation is implemented by the weighted-sum scalarization method. Then, through the learning training for MO-D3QN network model to realize the selection of the optimized scheduling rules in different environment states. Finally, the computational experiment is conducted on three scaled instances. The results show that MO-D3QN outperforms the single scheduling rule, random selection strategy, and classical DQN, thereby verifying its effectiveness and competitiveness.

Key words: island assembly line, automotive assembly, uncertain disturbances, dynamic scheduling, multi-objective deep reinforcement learning

中图分类号:

TP18

黄铭, 黄思翰, 陈建鹏, 董威, 王柏村, 阮兵, 高云鹏, 王国新, 阎艳. 基于多目标深度强化学习的不确定扰动下岛式装配线动态调度优化[J]. 机械工程学报, 2026, 62(5): 74-87.

HUANG Ming, HUANG Sihan, CHEN Jianpeng, DONG Wei, WANG Baicun, RUAN Bing, GAO Yunpeng, WANG Guoxin, YAN Yan. Dynamic Scheduling Optimization of Island Assembly Lines Under Uncertain Disturbances by Multi-objective Deep Reinforcement Learning[J]. Journal of Mechanical Engineering, 2026, 62(5): 74-87.

导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks

链接本文: http://www.cjmenet.com.cn/CN/10.3901/JME.260229

http://www.cjmenet.com.cn/CN/Y2026/V62/I5/74

参考文献

[1] 工业和信息化部. 《汽车行业稳增长工作方案(2023—2024年)》[EB/OL]. https://www.gov.cn/zhengce/zhengceku/202309/content_6901733.htm. Ministry of Industry and Information Technology. 《Work Plan for Stable Growth in the Automotive Industry (2023-2024)》[EB/OL]. https://www.gov.cn/zhengce/zhengceku/202309/content_6901733.htm.
[2] LIU Yaqiong，SUN Shudong，SHEN Gaopan，et al. An auction-based approach for multi-agent uniform parallel machine scheduling with dynamic jobs arrival[J]. Engineering，2024，35:32-45.
[3] LI Yuxin，GU Wenbin，YUAN Minghai，et al. Real-time data-driven dynamic scheduling for flexible job shop with insufficient transportation resources using hybrid deep Q network[J]. Robotics and Computer-Integrated Manufacturing，2022，74:102283.
[4] LUO Shu. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning[J]. Applied Soft Computing，2020，91:106208.
[5] HUANG Ming，HUANG Sihan，DU Baigang，et al. Fuzzy superposition operation and knowledge-driven co-evolutionary algorithm for integrated production scheduling and vehicle routing problem with soft time windows and fuzzy travel times[J]. IEEE Transactions on Fuzzy Systems，2025，33(12):4152-4166.
[6] 李浩然，高亮，李新宇. 基于离散人工蜂群算法的多目标分布式异构零等待流水车间调度方法[J]. 机械工程学报，2023，59(2):291-306. LI Haoran，GAO Liang，LI Xinyu. Discrete artificial bee colony algorithm for multi-objective distributed heterogeneous no-wait flowshop scheduling problem[J]. Journal of Mechanical Engineering，2023，59(2):291-306.
[7] 吴秀丽，闫晓燕. 基于改进Q学习的可重入混合流水车间绿色动态调度[J]. 机械工程学报，2023，59(13):246-259. WU Xiuli，YAN Xiaoyan. An improved Q-learning algorithm to optimize green dynamic scheduling problem in a reentrant hybrid flow shop[J]. Journal of Mechanical Engineering，2023，59(13):246-259.
[8] LIU Youshan，FAN Jiaxin，ZHAO Linlin，et al. Integration of deep reinforcement learning and multi-agent system for dynamic scheduling of re-entrant hybrid flow shop considering worker fatigue and skill levels[J]. Robotics and Computer-Integrated Manufacturing，2023，84:102605.
[9] 贺俊杰，张洁，张朋，等. 基于多智能体强化学习的纺织面料染色车间动态调度方法[J]. 计算机集成制造系统，2023，29(1):62-74. HE Junjie，ZHANG Jie，ZHANG Peng，et al. Multi-agent reinforcement learning based textile dyeing workshop dynamic scheduling method[J]. Computer Integrated Manufacturing Systems，2023，29(1):62-74.
[10] SUN Mingyue，DING Jiyuchen，ZHAO Zhiheng，et al. Out-of-order execution enabled deep reinforcement learning for dynamic additive manufacturing scheduling[J]. Robotics and Computer-Integrated Manufacturing，2025，91:102841.
[11] 顾文斌，李育鑫，刘斯麒，等. 数据驱动的智慧车间实时调度方法研究[J]. 机械工程学报，2023，59:47-61. GU Wenbin，LI Yuxin，LIU Siqi，et al. Research on data-driven real-time scheduling method of smart workshop[J]. Journal of Mechanical Engineering，2023，59(12):47-61.
[12] LEI Kun，GUO Peng，WANG Yi，et al. Large-scale dynamic scheduling for flexible job-shop with random arrivals of new jobs by hierarchical reinforcement learning[J]. IEEE Transactions on Industrial Informatics，2024，20(1):1007-1018.
[13] HENGEL K，WAGNER A，RUSKOWSKI M. A dynamic multi-objective scheduling approach for gradient-based reinforcement learning[J]. IFAC-PapersOnLine，2024，58(19):49-54.
[14] YUE Lei，PENG Kai，DING Linshan，et al. Two-stage double deep Q-network algorithm considering external non-dominant set for multi-objective dynamic flexible job shop scheduling problems[J]. Swarm and Evolutionary Computation，2024，90:101660.
[15] LI Kaiwen，ZHANG Tao，WANG Rui. Deep reinforcement learning for multiobjective optimization[J]. IEEE Transactions on Cybernetics，2021，51(6):3103-3114.
[16] RIEDMILLER M，HAFNER R，LAMPE T，et al. Learning by playing solving sparse reward tasks from scratch[C]//Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research; PMLR. 2018:4344-4353.
[17] MNIH V，KORAY K，SILVER D，et al. Playing atari with deep reinforcement learning[J]. ArXiv preprint，2013:arXiv:1312.5602.
[18] WANG Z，SCHAUL T，HESSEL M，et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research; PMLR. 2016:1995-2003.
[19] HASSELT H V，GUE A，SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2016，30(1):2094-2100.
[20] HUANG Ming. Detailed data of the IAS-Dataset[EB/OL]. https://www.huangm.cn/cn/zip/IAS-Dataset.zip .
[21] HE Kaiming，ZHANG Xiangyu，REN Shaoqing，et al. Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society. 2015:1026-1034.
[22] HUANG Ming，DU Baigang，GUO Jun. A hybrid collaborative framework for integrated production scheduling and vehicle routing problem with batch manufacturing and soft time windows[J]. Computers & Operations Research，2023，159:106346.

基于多目标深度强化学习的不确定扰动下岛式装配线动态调度优化

Dynamic Scheduling Optimization of Island Assembly Lines Under Uncertain Disturbances by Multi-objective Deep Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 10

编辑推荐

Metrics

本文评价

[1]	吴秀丽, 闫晓燕. 基于改进Q学习的可重入混合流水车间绿色动态调度[J]. 机械工程学报, 2023, 59(13): 246-259.
[2]	蔡磊, 李文锋, 罗云. 个性化定制车间生产-物流协同调度框架与算法研究[J]. 机械工程学报, 2022, 58(7): 214-226.
[3]	谢志强, 夏迎春. 基于遗传算法和分枝定界的多车间空闲产能调度方法[J]. 机械工程学报, 2022, 58(22): 462-472.
[4]	吕岩, 徐正军, 李聪波, 李玲玲, 杨秒. 考虑扰动事件的机械加工工艺参数与车间动态调度综合节能优化[J]. 机械工程学报, 2022, 58(19): 242-255.
[5]	熊禾根, 吴健, 阳光灿. 批量投放与交付的柔性动态Job Shop调度问题及其调度仿真研究[J]. 机械工程学报, 2020, 56(14): 231-244.
[6]	张泽群, 唐敦兵, 金永乔, 张海涛. 信息物联驱动下的离散车间自组织生产调度技术[J]. 机械工程学报, 2018, 54(16): 34-44.
[7]	王斌;汪峥;严洪森. 单件生产系统同步加工动态调度策略[J]. , 2011, 47(10): 177-184.
[8]	刘琳;谷寒雨;席裕庚. 工件到达时间未知的动态车间滚动重调度[J]. , 2008, 44(5): 68-75.
[9]	翟文彬;张洁;严隽琪;马登哲. 基于ETAEMS/GPGP-CN的半导体生产线动态调度技术研究[J]. , 2005, 41(3): 53-58.
[10]	柴永生;孙树栋;余建军;吴秀丽. 基于免疫遗传算法的车间动态调度[J]. , 2005, 41(10): 23-27.