• CN:11-2187/TH
  • ISSN:0577-6686

›› 2010, Vol. 46 ›› Issue (2): 76-82.

• 论文 • 上一篇    下一篇

扫码分享

基于Agent的递阶强化学习模型与体系结构

王文玺;肖世德;孟祥印;陈应松;张卫华   

  1. 西南交通大学机械工程学院;西南交通大学牵引动力国家重点实验室
  • 发布日期:2010-01-20

Model and Architecture of Hierarchical Reinforcement Learning Based on Agent

WANG Wenxi;XIAO Shide;MENG Xiangyin;CHEN Yingsong;ZHANG Weihua   

  1. School of Mechanical Engineering, Southwest Jiaotong University Traction Power State Key Laboratory, Southwest Jiaotong University
  • Published:2010-01-20

摘要: 通过引入频率最大Q值启发式学习算法,对一种递阶强化学习方法进行改进,解决在庞大状态空间和动态变化环境中对Agent进行最优行为策略学习的问题。引入属性维护算子以及承诺和规划意识属性,对经典信念、愿望、意图模型进行扩展,给出意识属性的理性维护过程,增强Agent的自适应性并使Agent具有在动态环境中进行在线学习的能力。根据意识模型提出一种具有主动性、适应性、反应性、社会性的Agent体系结构,并根据该体系结构开发出一种路径规划Agent。通过对行驶环境的组态设定,模拟车辆复杂的行驶状态,并通过对行驶状态的不断学习,最终获得最优路径,证明体系结构的可行性和有效性。

关键词: Agent, 强化学习, 体系结构, 意识模型

Abstract: By introducing frequency maximum Q heuristic learning algorithm, a hierarchical reinforcement learning method is improved, this method solves the problem of agent optimal strategy learning in a large scale state space and dynamic changing environment. Bringing attribute maintenance operator, the attribute of promise and layout into the classical belief, desire, intention(BDI) model, which is modified to increase the adaptability and in-line learning ability of agent, the rational maintenance process of consciousness attribute is given. A new agent system and architecture with initiative, autonomy, adaptability and sociality is proposed, and a new path planning agent (APP) is developed on the basis of this architecture. Through setting the configuration of drive environment, the complicated vehicle drive state is simulated, and through continuous learning of the drive state, the optimal path is obtained finally, and then the feasibility and effectiveness of the new architecture are verified.

Key words: Agent, Architecture, Consciousness model, Reinforcement learning

中图分类号: