• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2023, Vol. 59 ›› Issue (12): 78-88.doi: 10.3901/JME.2023.12.078

• 特邀专栏:制造大数据分析与决策 • 上一篇    下一篇

扫码分享

基于深度强化学习的工序交互式智能体Job shop调度方法

陈睿奇1, 黎雯馨1,2, 王传洋1, 杨宏兵1   

  1. 1. 苏州大学机电工程学院 苏州 215100;
    2. 上海大学管理学院 上海 200230
  • 收稿日期:2022-07-02 修回日期:2023-02-06 出版日期:2023-06-20 发布日期:2023-08-15
  • 通讯作者: 杨宏兵(通信作者),男,1977年出生,博士,副教授。主要研究方向为制造系统建模与分析,生产调度与优化方法。E-mail:yanghongbing@suda.edu.cn
  • 作者简介:陈睿奇,男,1999年出生。主要研究方向为生产调度系统,组合优化问题,深度强化学习。E-mail:Rui.IE@outlook.com;黎雯馨,女,1999年出生。主要研究方向为生产/物流调度,深度强化学习。E-mail:liwenxinn@gmail.com;王传洋,男,1972年出生,博士,教授,博士研究生导师。主要研究方向为智能制造,先进制造技术。E-mail:cywang@suda.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(52075354)。

Interactive Operation Agent Scheduling Method for Job Shop Based on Deep Reinforcement Learning

CHEN Ruiqi1, LI Wenxin1,2, WANG Chuanyang1, YANG Hongbing1   

  1. 1. School of Mechanical and Electric Engineering, Soochow University, Suzhou 215000;
    2. School of Management, Shanghai University, Shanghai 200230
  • Received:2022-07-02 Revised:2023-02-06 Online:2023-06-20 Published:2023-08-15

摘要: 针对作业车间调度问题(Job shop scheduling problem, JSSP)因NP-难属性难以快速获得优质解,以及生产场景随机扰动所导致的频繁重调度等求解难题,基于深度强化学习提出一种新颖的交互式工序智能体(Interactive operation agent, IOA)调度模型框架。在分析工序间工艺路线和加工设备约束关系的基础上,将Job shop的加工工序构建为工序智能体,设计工序智能体间的交互机制,智能体依据彼此关系进行特征交互并更新自身的特征向量,并基于工序特征和最早加工时间设计拟合动作值函数的深度神经网络,调度模型根据系统状态和工序智能体特征即可生成调度策略。采用Double DQN算法训练IOA调度模型,引入经验回放机制消除序列训练样本间的相关性,训练好的模型可以快速生成高质量的调度方案,并在机器发生故障时能够有效执行重调度策略。试验结果表明所提出的IOA调度方法优于贪婪算法和启发式调度规则,且具有良好鲁棒性和泛化能力。

关键词: Job shop调度, 深度强化学习, 工序智能体, 机器故障, double DQN算法

Abstract: Job shop scheduling problem(JSSP) is difficult to obtain high-quality solution quickly due to NP hard attribute, and rescheduling occurs frequently due to the random disturbances of production scenarios. Based on deep reinforcement learning, a novel interactive operation agent(IOA) scheduling model framework is proposed. Through analysis of the constraint relationship between process route and processing equipment among operations, the processing processes in job shop are constructed as operation agents. The interaction mechanism between operation agents is designed, and each agent can interact with each other and update its own feature vector according to their relationship. Further, a deep neural network is constructed based on the operation characteristics and the earliest processing time to fit the action value function. As a result, the scheduling model can generate the scheduling strategy according to the system state and the characteristics of each operation agent. Double DQN algorithm is used to train IOA scheduling model, and the introduction of empirical playback mechanism effectively breaks the correlation between sequence training samples. The trained model can quickly generate high-quality scheduling scheme, and effectively execute rescheduling production strategy in case of machine failure. Experimental results show that the proposed IOA scheduling method is superior to greedy algorithm and heuristic scheduling rules, and has good robustness and generalization ability.

Key words: Job shop scheduling, deep reinforcement learning, operation agents, machine failure, double DQN

中图分类号: