基于Agent的递阶强化学习模型与体系结构

摘要/Abstract

摘要： 通过引入频率最大Q值启发式学习算法，对一种递阶强化学习方法进行改进，解决在庞大状态空间和动态变化环境中对Agent进行最优行为策略学习的问题。引入属性维护算子以及承诺和规划意识属性，对经典信念、愿望、意图模型进行扩展，给出意识属性的理性维护过程，增强Agent的自适应性并使Agent具有在动态环境中进行在线学习的能力。根据意识模型提出一种具有主动性、适应性、反应性、社会性的Agent体系结构，并根据该体系结构开发出一种路径规划Agent。通过对行驶环境的组态设定，模拟车辆复杂的行驶状态，并通过对行驶状态的不断学习，最终获得最优路径，证明体系结构的可行性和有效性。

关键词: Agent, 强化学习, 体系结构, 意识模型

Abstract: By introducing frequency maximum Q heuristic learning algorithm, a hierarchical reinforcement learning method is improved, this method solves the problem of agent optimal strategy learning in a large scale state space and dynamic changing environment. Bringing attribute maintenance operator, the attribute of promise and layout into the classical belief, desire, intention(BDI) model, which is modified to increase the adaptability and in-line learning ability of agent, the rational maintenance process of consciousness attribute is given. A new agent system and architecture with initiative, autonomy, adaptability and sociality is proposed, and a new path planning agent (APP) is developed on the basis of this architecture. Through setting the configuration of drive environment, the complicated vehicle drive state is simulated, and through continuous learning of the drive state, the optimal path is obtained finally, and then the feasibility and effectiveness of the new architecture are verified.

Key words: Agent, Architecture, Consciousness model, Reinforcement learning

中图分类号:

TP391.41

王文玺;肖世德;孟祥印;陈应松;张卫华. 基于Agent的递阶强化学习模型与体系结构[J]. , 2010, 46(2): 76-82.

WANG Wenxi;XIAO Shide;MENG Xiangyin;CHEN Yingsong;ZHANG Weihua. Model and Architecture of Hierarchical Reinforcement Learning Based on Agent[J]. , 2010, 46(2): 76-82.

[1]	王无印, 黄子钊, 庄子龙, 方怀瑾, 秦威. 基于深度强化学习的自动化码头堆场场桥调度方法[J]. 机械工程学报, 2024, 60(6): 44-57.
[2]	赵阔, 王皂琦, 潘臻信, 潘扬华, 张中飞, 屈挺. 大数据驱动的快消品终端拜访“云-边”联动决策与优化[J]. 机械工程学报, 2024, 60(6): 58-68.
[3]	郭景华, 李文昌, 王班, 王靖瑶. 基于深度强化学习的网联混合动力汽车队列控制[J]. 机械工程学报, 2024, 60(2): 262-271.
[4]	李文礼, 张祎楠, 石晓辉, 王梦昕. 基于博弈论的右转无信号交叉口行人行为模拟[J]. 机械工程学报, 2024, 60(10): 86-101.
[5]	曾迪, 郑玲, 李以农, 杨显通. 自动驾驶奖励函数贝叶斯逆强化学习方法[J]. 机械工程学报, 2024, 60(10): 245-260.
[6]	杨硕, 李时珍, 赵中原, 黄小鹏, 黄岩军. 基于时序差分学习模型预测控制的一体化自动驾驶换道策略[J]. 机械工程学报, 2024, 60(10): 329-338.
[7]	张志勇, 黄大洋, 黄彩霞, 胡林, 杜荣华. TD3算法改进与自动驾驶汽车并道策略学习[J]. 机械工程学报, 2023, 59(8): 224-234.
[8]	郭洪飞, 陆鑫宇, 任亚平, 张超勇, 李建庆. 基于强化学习的群体进化算法求解双边多目标同步并行拆解线平衡问题[J]. 机械工程学报, 2023, 59(7): 355-366.
[9]	郑湃, 李成熙, 殷悦, 张荣, 鲍劲松, 王柏村, 谢海波, 王力翚. 增强现实辅助的互认知人机安全交互系统[J]. 机械工程学报, 2023, 59(6): 173-184.
[10]	娄山河, 冯毅雄, 郑浩, 胡炳涛, 洪兆溪, 谭建荣. 模拟人脑工作机制的机械产品概念设计行为原理启发求解方法[J]. 机械工程学报, 2023, 59(24): 344-358.
[11]	马丽莹, 魏云冰. 基于DDPG算法的发电企业报价策略研究[J]. 电气工程学报, 2023, 18(2): 192-200.
[12]	林歆悠, 叶卓明, 周斌豪. 基于DQN强化学习的自动驾驶转向控制策略[J]. 机械工程学报, 2023, 59(16): 315-324.
[13]	钟沛成, 骆德渊, 庞明君. 基于深度强化学习的四足机器人跟随策略研究及系统实现[J]. 机械工程学报, 2023, 59(13): 79-88.
[14]	顾文斌, 李育鑫, 刘斯麒, 苑明海, 裴凤雀. 数据驱动的智慧车间实时调度方法研究[J]. 机械工程学报, 2023, 59(12): 47-61.
[15]	陈睿奇, 黎雯馨, 王传洋, 杨宏兵. 基于深度强化学习的工序交互式智能体Job shop调度方法[J]. 机械工程学报, 2023, 59(12): 78-88.

基于Agent的递阶强化学习模型与体系结构

Model and Architecture of Hierarchical Reinforcement Learning Based on Agent

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价