• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2024, Vol. 60 ›› Issue (6): 44-57.doi: 10.3901/JME.2024.06.044

• 特邀专栏:数据-知识混合驱动的智能制造系统 • 上一篇    下一篇

扫码分享

基于深度强化学习的自动化码头堆场场桥调度方法

王无印1, 黄子钊1, 庄子龙1, 方怀瑾2, 秦威1   

  1. 1. 上海交通大学工业工程与管理系 上海 200240;
    2. 上港国际港务(集团)股份有限公司 上海 200080
  • 收稿日期:2023-07-07 修回日期:2023-11-21 出版日期:2024-03-20 发布日期:2024-06-07
  • 通讯作者: 秦威,男,1982年出生,博士,副教授,博士研究生导师。主要研究方向为复杂系统建模、控制与优化,机器智能理论、方法与应用。E-mail:wqin@sjtu.edu.cn
  • 作者简介:王无印,女,1999年出生。主要研究方向为智能调度算法。E-mail:yiiiiiiner@sjtu.edu.cn;黄子钊,男,1996年出生,硕士。主要研究方向为生产计划与调度控制、智能优化算法。E-mail:huangzz96@sjtu.edu.cn;庄子龙,男,1995年出生,博士研究生。主要研究方向为复杂系统建模优化,机器学习与工业智能。E-mail:zhuangzl@sjtu.edu.cn;方怀瑾,男,1963年出生,硕士,高级经济师,上海国际港务(集团)股份有限公司副总裁,负责工程技术和科技创新工作。E-mail:fanghj@portshanghai.cn
  • 基金资助:
    国家重点研发计划资助项目(2019YFB1704401)。

Yard Crane Scheduling Method Based on Deep Reinforcement Learning for the Automated Container Terminal

WANG Wuyin1, HUANG Zizhao1, ZHUANG Zilong1, FANG Huaijin2, QIN Wei1   

  1. 1. Institute of Industrial Engineering and Management, Shanghai Jiao Tong University, Shanghai 200240;
    2. Shanghai International Port (Group) Co., Ltd., Shanghai 200080
  • Received:2023-07-07 Revised:2023-11-21 Online:2024-03-20 Published:2024-06-07

摘要: 场桥是自动化码头堆场中的核心作业机械,场桥的合理调度是集装箱作业效率提升的关键。针对场桥调度问题具有的复杂时空耦合特性和高度的动态性,以最小化自动导引车(Automatic guided vehicle,AGV)和外集卡的等待时间为优化目标构建数学规划模型,并提出一种新颖的深度强化学习方法进行求解。算法设计贴近实际堆场作业环境的智能体,并在智能体与环境的交互部分通过指针网络、注意力机制和演员-评论家(Actor-critic,A-C)架构的设计提高了获取状态中的隐藏模式的能力。在基于洋山四期自动化码头实际数据生成的不同规模的算例上展开试验,所提算法能实现场桥调度方案的高效输出,相较于一些启发式规则算法有17%左右的性能提升。试验结果表明所提调度方法是有效且优越的,能够在实际中为堆场作业提供动态决策支持。

关键词: 自动化集装箱码头, 堆场, 场桥调度, 深度强化学习

Abstract: As the core working machinery of automated terminal yard, the dispatching of yard crane is the key to improve the efficiency of container operation. In order to minimize the waiting time of AGVs and external container trucks, a mathematical programming model for the yard crane scheduling problem is established considering complex spatio-temporal coupling characteristics and high dynamic, and a novel deep reinforcement learning method is proposed to solve the problem. The algorithm describes the yard environment close to reality through the agent definition, and improves the ability of extracting hidden state patterns through pointer network, attention mechanism and A-C architecture in the interaction design between the agent and the environment. Experiments are carried out on examples of different scales based on the actual data of Yangshan Phase IV Automated Terminal. The results show that the proposed algorithm can provide an approximately optimal crane scheduling scheme in a relatively short time, and the performance of it is about 17% better compared with state-of-art heuristic rule algorithms. Therefore, the proposed scheduling method is effective and superior, and it can provide dynamic decision support for yard operation in practice.

Key words: automated container terminal, yard, yard crane scheduling, deep reinforcement learning

中图分类号: