• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2026, Vol. 62 ›› Issue (5): 74-87.doi: 10.3901/JME.260229

• 特邀专栏:信息驱动的总装拉动生产模式、技术及应用 • 上一篇    

扫码分享

基于多目标深度强化学习的不确定扰动下岛式装配线动态调度优化

黄铭1, 黄思翰1,2, 陈建鹏1, 董威1, 王柏村3, 阮兵4, 高云鹏5, 王国新1,2, 阎艳1,2   

  1. 1. 北京理工大学机械与车辆学院 北京 100081;
    2. 北京理工大学工业知识与数据融合应用工业和信息化部重点实验室 北京 100081;
    3. 浙江大学机械工程学院 杭州 310058;
    4. 中国汽车工业工程有限公司 天津 300113;
    5. 国机智能技术研究院有限公司 北京 100013
  • 收稿日期:2025-02-25 修回日期:2025-07-15 发布日期:2026-04-23
  • 作者简介:黄铭,男,1999年出生,博士研究生。主要研究方向为智能制造系统调度与动态优化。E-mail:huangming@bit.edu.cn
    黄思翰(通信作者),男,1991年出生,博士,特聘研究员,博士研究生导师。主要研究方向为具身智能可重构制造、人体智造、数字孪生管控优化。E-mail:hsh@bit.edu.cn
    陈建鹏,男,2000年出生,硕士研究生。主要研究方向为人机协同作业。E-mail:2393225727@qq.com
    董威,男,2002年出生,硕士研究生。主要研究方向为智能制造系统智能仿真与优化。E-mail:1635611987@qq.com
  • 基金资助:
    国家重点研发计划(2024YFB3309801)、国家自然科学基金(52405530)、北京市自然科学基金重点研究专题(L243009)和北京理工大学青年教师学启动计划资助项目。

Dynamic Scheduling Optimization of Island Assembly Lines Under Uncertain Disturbances by Multi-objective Deep Reinforcement Learning

HUANG Ming1, HUANG Sihan1,2, CHEN Jianpeng1, DONG Wei1, WANG Baicun3, RUAN Bing4, GAO Yunpeng5, WANG Guoxin1,2, YAN Yan1,2   

  1. 1. School of Mechanical Engineering, Beijing Institute of Technology, Beijing 10081;
    2. Key Laboratory of Industry Knowledge & Data Fusion Technology and Application, Ministry of Industry and Information Technology, Beijing Institute of Technology, Beijing 100081;
    3. School of Mechanical Engineering, Zhejiang University, Hangzhou 310058;
    4. Automotive Engineering Corporation, Tianjin 300113;
    5. SINOMACH Intelligence Technology Research Institute Co., Ltd., Beijing 100013
  • Received:2025-02-25 Revised:2025-07-15 Published:2026-04-23

摘要: 随着新能源汽车产业的快速发展以及市场需求多样化、定制化趋势的兴起,一种新兴的岛式装配模式应运而生,用于解决传统汽车装配流水线柔性不足的问题。此外,在实际装配环境中以紧急插单为典型代表的不确定事件频发,严重制约着汽车总装的稳定性和生产效率。 因此,结合实际需要,开展不确定扰动下岛式装配线动态调度优化。首先,构建形成以最小化最大完工时间和插单变更指数为双优化目标的混合整数非线性规划模型。其次,设计了一种多目标对决双层深度Q网络(MO-D3QN)求解该模型。其中针对岛式装配场景中装配岛、装配工艺、装配产品以及生产运输环节的特征设计了状态指标和动作调度规则。面向双优化目标分别构建了连续型即时奖励函数组件,并采用加权和标量化方法实现奖励聚合。然后,通过对MO-D3QN算法模型的学习训练,实现在不同环境状态下最佳调度规则的选择。最后,在3种规模的实例上开展计算试验,结果表明MO-D3QN优于单一调度规则、随机选择策略和经典DQN,从而验证了MO-D3QN的有效性和竞争力。

关键词: 岛式装配线, 汽车总装, 不确定扰动, 动态调度, 多目标深度强化学习

Abstract: With the rapid development of the new energy vehicle industry and the rise of diversified market demand and customization trends, an emerging island assembly mode has been introduced to address the lack of flexibility in the traditional automotive assembly line. Moreover, the frequent occurrence of uncertain events, such as emergency order insertion, severely restricts the stability and productivity of automotive final assembly in the actual assembly environment. Therefore, based on practical needs, dynamic scheduling optimization of island assembly lines under uncertain disturbances is conducted. First, a mixed-integer nonlinear programming model is formulated with the dual objectives of minimizing the maximum completion time and the order change index. Secondly, a multi-objective dueling double deep Q-network (MO-D3QN) is designed to solve this model. In this framework, state indicators and action scheduling rules are developed based on the features of assembly islands, assembly processes, assembly products, and production transportations in the island assembly scenario. Continuous immediate reward function components are constructed separately for dual optimization objectives, and reward aggregation is implemented by the weighted-sum scalarization method. Then, through the learning training for MO-D3QN network model to realize the selection of the optimized scheduling rules in different environment states. Finally, the computational experiment is conducted on three scaled instances. The results show that MO-D3QN outperforms the single scheduling rule, random selection strategy, and classical DQN, thereby verifying its effectiveness and competitiveness.

Key words: island assembly line, automotive assembly, uncertain disturbances, dynamic scheduling, multi-objective deep reinforcement learning

中图分类号: