基于深度强化学习的工序交互式智能体Job shop调度方法

doi:10.3901/JME.2023.12.078

机械工程学报 ›› 2023, Vol. 59 ›› Issue (12): 78-88.doi: 10.3901/JME.2023.12.078

• 特邀专栏：制造大数据分析与决策 • 上一篇下一篇

扫码分享

基于深度强化学习的工序交互式智能体Job shop调度方法

陈睿奇¹, 黎雯馨^1,2, 王传洋¹, 杨宏兵¹

1. 苏州大学机电工程学院苏州 215100;
2. 上海大学管理学院上海 200230

收稿日期:2022-07-02 修回日期:2023-02-06 出版日期:2023-06-20 发布日期:2023-08-15
通讯作者: 杨宏兵(通信作者),男,1977年出生,博士,副教授。主要研究方向为制造系统建模与分析,生产调度与优化方法。E-mail:yanghongbing@suda.edu.cn
作者简介:陈睿奇,男,1999年出生。主要研究方向为生产调度系统,组合优化问题,深度强化学习。E-mail:Rui.IE@outlook.com;黎雯馨,女,1999年出生。主要研究方向为生产/物流调度,深度强化学习。E-mail:liwenxinn@gmail.com;王传洋,男,1972年出生,博士,教授,博士研究生导师。主要研究方向为智能制造,先进制造技术。E-mail:cywang@suda.edu.cn
基金资助:
国家自然科学基金资助项目(52075354)。

Interactive Operation Agent Scheduling Method for Job Shop Based on Deep Reinforcement Learning

CHEN Ruiqi¹, LI Wenxin^1,2, WANG Chuanyang¹, YANG Hongbing¹

1. School of Mechanical and Electric Engineering, Soochow University, Suzhou 215000;
2. School of Management, Shanghai University, Shanghai 200230

Received:2022-07-02 Revised:2023-02-06 Online:2023-06-20 Published:2023-08-15

摘要/Abstract

摘要： 针对作业车间调度问题(Job shop scheduling problem, JSSP)因NP-难属性难以快速获得优质解，以及生产场景随机扰动所导致的频繁重调度等求解难题，基于深度强化学习提出一种新颖的交互式工序智能体(Interactive operation agent, IOA)调度模型框架。在分析工序间工艺路线和加工设备约束关系的基础上，将Job shop的加工工序构建为工序智能体，设计工序智能体间的交互机制，智能体依据彼此关系进行特征交互并更新自身的特征向量，并基于工序特征和最早加工时间设计拟合动作值函数的深度神经网络，调度模型根据系统状态和工序智能体特征即可生成调度策略。采用Double DQN算法训练IOA调度模型，引入经验回放机制消除序列训练样本间的相关性，训练好的模型可以快速生成高质量的调度方案，并在机器发生故障时能够有效执行重调度策略。试验结果表明所提出的IOA调度方法优于贪婪算法和启发式调度规则，且具有良好鲁棒性和泛化能力。

关键词: Job shop调度, 深度强化学习, 工序智能体, 机器故障, double DQN算法

Abstract: Job shop scheduling problem(JSSP) is difficult to obtain high-quality solution quickly due to NP hard attribute, and rescheduling occurs frequently due to the random disturbances of production scenarios. Based on deep reinforcement learning, a novel interactive operation agent(IOA) scheduling model framework is proposed. Through analysis of the constraint relationship between process route and processing equipment among operations, the processing processes in job shop are constructed as operation agents. The interaction mechanism between operation agents is designed, and each agent can interact with each other and update its own feature vector according to their relationship. Further, a deep neural network is constructed based on the operation characteristics and the earliest processing time to fit the action value function. As a result, the scheduling model can generate the scheduling strategy according to the system state and the characteristics of each operation agent. Double DQN algorithm is used to train IOA scheduling model, and the introduction of empirical playback mechanism effectively breaks the correlation between sequence training samples. The trained model can quickly generate high-quality scheduling scheme, and effectively execute rescheduling production strategy in case of machine failure. Experimental results show that the proposed IOA scheduling method is superior to greedy algorithm and heuristic scheduling rules, and has good robustness and generalization ability.

Key words: Job shop scheduling, deep reinforcement learning, operation agents, machine failure, double DQN

中图分类号:

TH166

陈睿奇, 黎雯馨, 王传洋, 杨宏兵. 基于深度强化学习的工序交互式智能体Job shop调度方法[J]. 机械工程学报, 2023, 59(12): 78-88.

CHEN Ruiqi, LI Wenxin, WANG Chuanyang, YANG Hongbing. Interactive Operation Agent Scheduling Method for Job Shop Based on Deep Reinforcement Learning[J]. Journal of Mechanical Engineering, 2023, 59(12): 78-88.

参考文献

[1] Zhang J,Ding G,Zou Y,et al. Review of job shop scheduling research and its new perspectives under Industry 4.0[J]. Journal of Intelligent Manufacturing,2017,30:1809-1830.
[2] 肖世昌,吴自高,孙树栋,等. 双资源约束的鲁棒Job Shop调度问题研究[J]. 机械工程学报,2021,57(4):227-239. XIAO Shichang,WU Zigao,SUN Shudong,et al. Research on the dual-resource constrained robust job shop scheduling problems[J]. Journal of Mechanical Engineering,2021,57(4):227-239.
[3] XANTHOPOULOS A,KOULOURIOTIS D E. Cluster analysis and neural network-based metamodeling of priority rules for dynamic sequencing[J]. Journal of Intelligent Manufacturing,2018,29(1):69-91.
[4] WANG C,JIANG P. Manifold learning based rescheduling decision mechanism for recessive disturbances in RFID-driven job shops[J]. Journal of Intelligent Manufacturing,2018,29(7):1485-1500.
[5] PENG B,LÜ Z,CHENG T. A tabu search/path relinking algorithm to solve the job shop scheduling problem[J]. Computers & Operations Research,2015,53:154-164.
[6] CROCE F,TADEI R,VOLTA G. A genetic algorithm for the job shop problem[J]. Computers & Operations Research,1995,22(1):15-24.
[7] Werner F,Winkler A. Insertion techniques for the heuristic solution of the job shop problem[J]. Discrete Applied Mathematics,1995,58(2):191-211.
[8] Adams J,Balas E,Zawack D. The shifting bottleneck procedure for job shop scheduling[J]. Management Science,1988,34(3):391-401.
[9] Zhang W,Wen J B,Zhu Y C,et al. Multi-objective scheduling simulation of flexible job-shop based on multi-population genetic algorithm[J]. International Journal of Simulation Modelling,2017,16(2):313-321.
[10] 赵诗奎. Job Shop基于无延迟调度路径重连与回溯禁忌搜索算法研究[J]. 机械工程学报,2021,57(14):291-303. ZHAO Shikui. Research on path relinking based on non-delay scheduling and backtracking tabu search algorithm of job shop scheduling problem[J]. Journal of Mechanical Engineering,2021,57(14):291-303.
[11] 孟磊磊,张彪,任亚平,等. 求解分布式柔性作业车间调度的混合蛙跳算法[J]. 机械工程学报,2021,57(17):263-272. MENG Leilei,ZHANG Biao,REN Yaping,et al. Hybrid shuffled frog-leaping algorithm for distributed flexible job shop scheduling[J]. Journal of Mechanical Engineering,2021,57(17):263-272.
[12] Muhammad K A,Shahid I B,RUBEENA K,et al. Recent research trends in genetic algorithm based flexible job shop scheduling problems[J]. Mathematical Problems in Engineering,2018(8):1-3.
[13] SHAKHLEVICH N,SOTSKOV Y N,WERNER F. Adaptive scheduling algorithm based on mixed graph model[J]. IEE Proceedings-Control Theory and Applications,1996,143(1):9-16.
[14] LEE K K. Fuzzy rule generation for adaptive scheduling in a dynamic manufacturing environment[J]. Applied Soft Computing,2008,8(4):1295-1304.
[15] Wang L,PAN Z X,WANG J J. A review of reinforcement learning based intelligent optimization for manufacturing scheduling[J]. Complex System Modeling and Simulation,2021,1(4):257-270.
[16] DRUGAN M M. Reinforcement learning versus evolutionary computation:A survey on hybrid algorithms[J]. Swarm and Evolutionary Computation,2019,44:228-246.
[17] Mnih V,Kavukcuoglu K,Silver D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015,518(7540):529-533.
[18] Silver D,Huang A,Maddison C J,et al. Mastering the game of go with deep neural networks and tree search[J]. Nature,2016,529(7587):484-489.
[19] Silver D,Schrittwieser J,Simonyan K,et al. Mastering the game of go without human knowledge[J]. Nature,2017,550(7676):354-359.
[20] LIU C L,CHANG C C,TSENG C J. Actor-critic deep reinforcement learning for solving job shop scheduling problems[J]. IEEE Access,2020,8:71752-71762.
[21] 肖鹏飞,张超勇,孟磊磊,等. 基于深度强化学习的非置换流水车间调度问题[J]. 计算机集成制造系统,2021,27(1):192-205. XIAO Pengfei,ZHANG Chaoyong,MENG Leilei,et al. Non-permutation flow shop scheduling problem based on deep reinforcement learning[J]. Computer Integrated Manufacturing System,2021,27(1):192-205.
[22] 王凌,潘子肖. 基于深度强化学习与迭代贪婪的流水车间调度优化[J]. 控制与决策,2021,36(11):2609-2617. WANG Ling,PAN Zixiao. Scheduling optimization for flow-shop based on deep reinforcement learning and iterative greedy method[J]. Control and Decision,2021,36(11):2609-2617.
[23] Palombarini J A,MARTÍNEZ E C. Closed-loop rescheduling using deep reinforcement learning[J]. IFAC-PapersOnLine,2019,52(1):231-236.
[24] Han B A,Yang J J. Research on adaptive job shop scheduling problems based on dueling double DQN[J]. IEEE Access,2020,8:186474-186495.
[25] Luo S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning[J]. Applied Soft Computing,2020,91:106208.
[26] ZHANG Y,ZHU H,TANG D,et al. Dynamic job shop scheduling based on deep reinforcement learning for multi-agent manufacturing systems[J]. Robotics and Computer-Integrated Manufacturing,2022,78:102412.
[27] LIU R,PIPLANI R,TORO C. Deep reinforcement learning for dynamic scheduling of a flexible job shop[J]. International Journal of Production Research,2022,60(13):4049-4069.
[28] DAI H,KHALIL E B,ZHANG Y,et al. Learning combinatorial optimization algorithms over graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach,California,USA:2017:6351-6361.
[29] Hasselt H. Double Q-learning[J]. Advances in Neural Information Processing Systems,2010,23:2613-2621.
[30] Hasselt H V,Guez A,Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix,Arizona,USA:2016:2094-2100.
[31] BEASLEY J E. OR-Library:Distributing test problems by electronic mail[J]. Journal of the Operational Research Society,1990,41(11):1069-1072.

基于深度强化学习的工序交互式智能体Job shop调度方法

Interactive Operation Agent Scheduling Method for Job Shop Based on Deep Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics

本文评价

[1]	王无印, 黄子钊, 庄子龙, 方怀瑾, 秦威. 基于深度强化学习的自动化码头堆场场桥调度方法[J]. 机械工程学报, 2024, 60(6): 44-57.
[2]	苏建涛, 董绍华, 朱诗敏. 多目标混合流水车间机器故障重调度问题研究[J]. 机械工程学报, 2024, 60(4): 438-448.
[3]	郭景华, 李文昌, 王班, 王靖瑶. 基于深度强化学习的网联混合动力汽车队列控制[J]. 机械工程学报, 2024, 60(2): 262-271.
[4]	钟沛成, 骆德渊, 庞明君. 基于深度强化学习的四足机器人跟随策略研究及系统实现[J]. 机械工程学报, 2023, 59(13): 79-88.
[5]	唐小林, 陈佳信, 高博麟, 杨凯, 胡晓松, 李克强. 基于云控系统高精度地图驱动的深度强化学习型混合动力汽车集成控制[J]. 机械工程学报, 2022, 58(24): 163-177.
[6]	唐鑫, 欧阳权, 黄俍卉, 王志胜, 马瑞. 基于深度强化学习的锂电池快速充电控制策略[J]. 机械工程学报, 2022, 58(22): 69-78.
[7]	王辉, 徐佳文, 严如强. 基于多尺度注意力深度强化学习网络的行星齿轮箱智能诊断方法[J]. 机械工程学报, 2022, 58(11): 133-142.
[8]	肖世昌, 吴自高, 孙树栋, 金梅. 双资源约束的鲁棒Job Shop调度问题研究[J]. 机械工程学报, 2021, 57(4): 227-239.
[9]	唐小林, 陈佳信, 刘腾, 李佳承, 胡晓松. 基于深度强化学习的混合动力汽车智能跟车控制与能量管理策略研究[J]. 机械工程学报, 2021, 57(22): 237-246.
[10]	陈超逸, 鲁娟, 陈楷, 黎宇嘉, 马俊燕, 廖小平. 车削表面粗糙度解析模型与DDQN-SVR预测模型研究[J]. 机械工程学报, 2021, 57(13): 262-272.
[11]	施群, 吕雷, 谢家骏. 可变环境下仿人机器人智能姿态控制[J]. 机械工程学报, 2020, 56(3): 64-72.
[12]	郭鹏, 张新艳, 余建波. 基于深度强化学习与有限元仿真集成的拉深成形控制[J]. 机械工程学报, 2020, 56(20): 47-58.
[13]	肖世昌, 孙树栋, 国欢, 金梅, 杨宏安. 求解随机Job Shop调度问题的混合分布估计算法[J]. 机械工程学报, 2015, 51(20): 27-35.
[14]	熊禾根;李建军;孔建益;杨金堂;蒋国璋. 考虑工序相关性的动态Job shop调度问题启发式算法[J]. , 2006, 42(8): 50-55.