机械工程学报 ›› 2023, Vol. 59 ›› Issue (23): 1-22.doi: 10.3901/JME.2023.23.001
丛明1, 李金钟1, 刘冬1, 杜宇2
收稿日期:
2022-12-05
修回日期:
2023-06-08
出版日期:
2023-12-05
发布日期:
2024-02-20
通讯作者:
刘冬(通信作者),男,1985年出生,博士,副教授,博士研究生导师。主要研究方向为智能机器人与系统,智能控制,机器人机构学。E-mail:liud@dlut.edu.cn
作者简介:
丛明,男,1963年出生,博士,教授,博士研究生导师。主要研究方向为机构与机器人学,智能控制,工业机器人技术与应用,仿生机器人及控制。E-mail:congm@dlut.edu.cn;李金钟,男,1994年出生,博士研究生。主要研究方向为智能机器人与系统,智能控制。E-mail:lijz_1994@mail.dlut.edu.cn;杜宇,女,1981年出生,博士。主要研究方向为机器人认知控制,机器人机构学,人机交互等。E-mail:duyu@djtu.edu.cn
基金资助:
CONG Ming1, LI Jinzhong1, LIU Dong1, DU Yu2
Received:
2022-12-05
Revised:
2023-06-08
Online:
2023-12-05
Published:
2024-02-20
摘要: 人们试图从人类处理复杂任务的表现中总结出认知机理并将其应用到机器人身上,用于替代人类本身处理复杂的任务。特别是在人机交互领域,通过编程演示和模仿学习,已经提出了向机器人传授新技能的技术。剩下的一个重要挑战是赋予机器人推断人类意图的能力,以及通过基于目标的模仿而不是遵循已演示的动作轨迹(基于轨迹或动作模仿)来学习新技能的能力。同习惯性行为(输入-操作映射)的认知机制不同,基于目标的方法首先推断出行为的目标,然后产生实现这些目标的规划。目标的存在被认为是构成人类高级认知能力的关键,在指导机器人学习人类处理复杂任务的技能方面具有重要的参考意义。首先从认知科学领域相关的研究介绍目标导向的行为在人类高级认知过程中起着重要的作用,从人工智能的三个范式行为主义、符号主义及连接主义介绍体现目标导向行为的计算框架,继而介绍目标导向的认知基础在仿生机器人导航、人机交互意图识别和机器人技能学习等方面的应用;最后介绍了近年来相关的研究项目,并给出未来发展的一些建议。
中图分类号:
丛明, 李金钟, 刘冬, 杜宇. 目标导向的认知机制在机器人领域的应用综述[J]. 机械工程学报, 2023, 59(23): 1-22.
CONG Ming, LI Jinzhong, LIU Dong, DU Yu. Review of the Application of Goal-directed Cognitive Mechanism in Robotics[J]. Journal of Mechanical Engineering, 2023, 59(23): 1-22.
[1] ABBEEL P,NG A Y. Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the 21th International Conference on Machine Learning,New York:ACM,2004:1. [2] HO J,ERMON S. Generative adversarial imitation learning[J]. Advances in Neural Information Processing Systems,2016(1):4565-4573. [3] SILVER D,HUBERT T,SCHRITTWIESER J,et al. A general reinforcement learning algorithm that masters chess,shogi,and go through self-play[J]. Science,2018,362(6419):1140-1144. [4] SCHRITTWIESER J,ANTONOGLOU I,HUBERT T,et al. Mastering atari,go,chess and shogi by planning with a learned model[J]. Nature,2020,588(7839):604-609. [5] LEVINE S,FINN C,DARRELL T,et al. End-to-end training of deep visuomotor policies[J]. The Journal of Machine Learning Research,2016,17(1):1334-1373. [6] PINTO L,GUPTA A. Supersizing self-supervision:Learning to grasp from 50k tries and 700 robot hours[C]//Proceedings of the 2016 IEEE international conference on robotics and automation (ICRA),Stockholm:IEEE,2016:3406-3413. [7] FINN C,ABBEEL P,LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the International Conference on Machine Learning,Sydney:PMLR,2017:1126-1135. [8] MISHRA N,ROHANINEJAD M,CHEN Xi,et al. A simple neural attentive meta-learner[C]//Proceedings of the 6th International Conference on Learning Representations (ICLR),Vancouver:OpenReview.net,2018:1. [9] SONG Xingyou,YANG Yuxiang,CHOROMANSKI K,et al. Rapidly adaptable legged robots via evolutionary meta-learning[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),Las Vegas:IEEE,2020:3769-3776. [10] ZHANG Tianhao,MCCARTHY Z,JOW O,et al. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA),Brisbane:IEEE,2018:5628-5635. [11] WU Yueh-Hua,CHAROENPHAKDEE N,BAO Han,et al. Imitation learning from imperfect demonstration[C]//Proceedings of the International Conference on Machine Learning,California:PMLR,2019:6818-6827. [12] BALAKRISHNA A,THANANJEYAN B,LEE J,et al. On-policy robot imitation learning from a converging supervisor[C]//Proceedings of the Conference on Robot Learning,PMLR,2020:24-41. [13] PENG Xuebin,ANDRYCHOWICZ M,ZAREMBA W,et al. Sim-to-real transfer of robotic control with dynamics randomization[C]//2018 IEEE International Conference on Robotics and Automation(ICRA),Brisbane:IEEE,2018:3803-3810. [14] YANG Jiachen,PETERSEN B,ZHA Hongyuan,et al. Single episode policy transfer in reinforcement learning[C]//8th International Conference on Learning Representations (ICLR),Addis Ababa:OpenReview.net,2020:1. [15] HOUTHOOFT R,CHEN Y,ISOLA P,et al. Evolved policy gradients[J]. Advances in Neural Information Processing Systems,2018(1):5400-5409. [16] WANG Haozhe,ZHOU Jiale,HE Xuming. Learning context-aware task reasoning for efficient meta reinforcement learning[C]//Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems,Auckland:International Foundation for Autonomous Agents and Multiagent Systems,2020:1440-1448. [17] RAKELLY K,ZHOU A,FINN C,et al. Efficient off-policy meta-reinforcement learning via probabilistic context variables[C]//Proceedings of the International Conference on Machine Learning,California:PMLR,2019:5331-5340. [18] GLEISSNER B,BEKKERING H,MELTZOFF A N. Children’s coding of human action:Cognitive factors influencing imitation in 3 year old[J]. Developmental Science,2000,3(4):405-414. [19] RUMIATI R I,BEKKERING H. To imitate or not to imitate? How the brain can do it,that is the question![J]. Brain and Cognition,2003,53(3):479-482. [20] MELTZOFF A N,KUHL P K,MOVELLAN J,et al. Foundations for a new science of learning[J]. Science,2009,325(5938):284-288. [21] SHON AP,STORZ J J,RAO R P N. Towards a real-time bayesian imitation system for a humanoid robot[C]//Proceedings of the 2007 IEEE International Conference on Robotics and Automation,Roma:IEEE,2007:2847-2852. [22] HUANG D A,CHAO Y W,PAXTON C,et al. Motion reasoning for goal-based imitation learning[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE,2020:4878-4884. [23] WOOD W,RÜNGER D. Psychology of habit[J]. Annual Review of Psychology,2016,67:289-314. [24] DOLAN R J,DAYAN P. Goals and habits in the brain[J]. Neuron,2013,80(2):312-325. [25] CHUNG M J Y,FRIESEN A L,FOX D,et al. A Bayesian developmental approach to robotic goal-based imitation learning[J]. PloS One,2015,10(11):e0141965. [26] BALLEINE B W,DEZFOULI A,ITO M,et al. Hierarchical control of goal-directed action in the cortical–basal ganglia network[J]. Current Opinion in Behavioral Sciences,2015,5:1-7. [27] MANNELLA F,GURNEY K,BALDASSARRE G. The nucleus accumbens as a nexus between values and goals in goal-directed behavior:A review and a new hypothesis[J]. Frontiers in Behavioral Neuroscience,2013,7:135. [28] YIN H H,OSTLUND S B,KNOWLTON B J,et al. The role of the dorsomedial striatum in instrumental conditioning[J]. European Journal of Neuroscience,2005,22(2):513-523. [29] BROVELLI A,NAZARIAN B,MEUNIER M,et al. Differential roles of caudate nucleus and putamen during instrumental learning[J]. Neuroimage,2011,57(4):1580-1590. [30] JAHANSHAHI M,OBESO I,ROTHWELL J C,et al. A fronto–striato–subthalamic–pallidal network for goal-directed and habitual inhibition[J]. Nature Reviews Neuroscience,2015,16(12):719-732. [31] CALIGIORE D,ARBIB M A,MIALL R C,et al. The super-learning hypothesis:Integrating learning processes across cortex,cerebellum and basal ganglia[J]. Neuroscience & Biobehavioral Reviews,2019,100:19-34. [32] THILL S,CALIGIORE D,BORGHI A M,et al. Theories and computational models of affordance and mirror systems:An integrative review[J]. Neuroscience & Biobehavioral Reviews,2013,37(3):491-521. [33] VAN-HORENBEKE F A,PEER A. Activity,plan,and goal recognition:A review[J]. Frontiers in Robotics and AI,2021,8:643010. [34] VERNON D,VON HOFSTEN C,FADIGA L. A roadmap for cognitive development in humanoid robots[M]. Springer Science & Business Media,2011. [35] VERNON D,BEETZ M,SANDINI G. Prospection in cognition:The case for joint episodic-procedural memory in cognitive robotics[J]. Frontiers in Robotics and AI,2015,2:19. [36] ZIAFAR M,NAMAZIANDOST E. From behaviorism to new behaviorism:A review study[J]. Loquen:English Studies Journal,2019,12(2):109-116. [37] ASCHERSLEBEN G,HOFER T,JOVANOVIC B. The link between infant attention to goal-directed action and later theory of mind abilities[J]. Dev Sci,2008,11(6):862-868. [38] CHEN Boyuan,VONDRICK C,LIPSON H. Visual behavior modelling for robotic theory of mind[J]. Scientific Reports,2021,11(1):1-14. [39] TRAFTON J G,CASSIMATIS N L,BUGAJSKA M D,et al. Enabling effective human-robot interaction using perspective-taking in robots[J]. IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2005,35(4):460-470. [40] BERLIN M,GRAY J,THOMAZ A L,et al. Perspective taking:An organizing principle for learning in human-robot interaction[C]//Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference,Massachusetts:AAAI,2006:1444-1450. [41] MILLIEZ G,WARNIER M,CLODIC A,et al. A framework for endowing an interactive robot with reasoning capabilities about perspective-taking and belief management[C]//Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication,Edinburgh:IEEE,2014:1103-1109. [42] DEVIN S,ALAMI R. An implemented theory of mind to improve human-robot shared plans execution[C]//Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI),Christchurch:IEEE,2016:319-326. [43] KWISTHOUT J. What can the PGM community contribute to the ‘Bayesian Brain’hypothesis?[J]. PGM,2018(1):37. [44] STOCK A,STOCK C. A short history of ideo-motor action[J]. Psychological Research,2004,68(2):176-188. [45] VERSCHOOR S,WEIDEMA M,BIRO S,et al. Where do action goals come from? Evidence for spontaneous action–effect binding in infants[J]. Frontiers in Psychology,2010,1:201. [46] HOMMEL B. GOALIATH:A theory of goal-directed behavior[J]. Psychological Research,2022,86(4):1054-1077. [47] BRATMAN M E. Intention and personal policies[J]. Philosophical Perspectives,1989,3:443-469. [48] TOMASELLO M,CARPENTER M,CALL J,et al. Understanding and sharing intentions:The origins of cultural cognition[J]. Behavioral and Brain Sciences,2005,28(5):675-691. [49] ONDOBAKA S,BEKKERING H. Hierarchy of idea-guided action and perception-guided movement[J]. Frontiers in Psychology,2012,3:579. [50] SHIN Y K,PROCTOR R W,CAPALDI E J. A review of contemporary ideomotor theory[J]. Psychological Bulletin,2010,136(6):943. [51] HOMMEL B. Action control according to TEC (theory of event coding)[J]. Psychological Research PRPF,2009,73(4):512-526. [52] VERNON D. Bridging ideomotor theory and autonomous development with perceptuo-motor memory[J]. System,2015,2:F2. [53] VON HOFSTEN C. An action perspective on motor development[J]. Trends in Cognitive Sciences,2004,8(6):266-272. [54] SELIGMAN M E P,RAILTON P,BAUMEISTER R F,et al. Navigating into the future or driven by the past[J]. Perspectives on Psychological Science,2013,8(2):119-141. [55] VERNON D. Artificial cognitive systems:A primer[M]. Cambridge:MIT Press,2014. [56] COUGHLIN C,LYONS K E,GHETTI S. Remembering the past to envision the future in middle childhood:Developmental linkages between prospection and episodic memory[J]. Cognitive Development,2014,30:96-110. [57] DAW N D,DAYAN P. The algorithmic anatomy of model-based evaluation[J]. Philosophical Transactions of the Royal Society B:Biological Sciences,2014,369(1655):20130478. [58] BOTVINICK M,WEINSTEIN A. Model-based hierarchical reinforcement learning and human action control[J]. Philosophical Transactions of the Royal Society B:Biological Sciences,2014,369(1655):20130480. [59] FRISTON K,SCHWARTENBECK P,FITZGERALD T,et al. The anatomy of choice:Dopamine and decision-making[J]. Philosophical Transactions of the Royal Society B:Biological Sciences,2014,369(1655):20130481. [60] KACHERGIS G,WYATTE D,O'REILLY R C,et al. A continuous-time neural model for sequential action[J]. Philosophical Transactions of the Royal Society B:Biological Sciences,2014,369(1655):20130623. [61] BAKER C L,TENENBAUM J B,SAXE R R. Bayesian models of human action understanding[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems,Vancouver:MIT Press,2005:99-106. [62] BASANISI R,BROVELLI A,CARTONI E,et al. A generative spiking neural-network model of goal-directed behaviour and one-step planning[J]. PLOS Computational Biology,2020,16(12):e1007579. [63] DOYA K,ISHII S,POUGET A,et al. Bayesian brain:Probabilistic approaches to neural coding[M]. Cambridge:MIT Press,2007. [64] BOTVINICK M,TOUSSAINT M. Planning as inference[J]. Trends in Cognitive Sciences,2012,16(10):485-488. [65] KAPPEN H J,GÓMEZ V,OPPER M. Optimal control as a graphical model inference problem[J]. Machine Learning,2012,87(2):159-182. [66] HOMMEL B,MÜSSELER J,ASCHERSLEBEN G,et al. The theory of event coding (TEC):A framework for perception and action planning[J]. Behavioral and Brain Sciences,2001,24(5):849-878. [67] HOMMEL B. Theory of event coding (TEC) V2.0:Representing and controlling perception and action[J]. Attention,Perception,& Psychophysics,2019,81(7):2139-2154. [68] HOMMEL B. Binary theorizing does not account for action control[J]. Frontiers in Psychology,2019,10:2542. [69] BERNHARD H. The theory of event coding (TEC) as embodied-cognition framework[J]. Frontiers in Psychology,2015,6:1318. [70] GLÄSCHER J,DAW N,DAYAN P,et al. States versus rewards:Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning[J]. Neuron,2010,66(4):585-595. [71] BALLEINE B W,O'DOHERTY J P. Human and rodent homologies in action control:Corticostriatal determinants of goal-directed and habitual action[J]. Neuropsychopharmacology,2010,35(1):48-69. [72] LENZ I,KNEPPER R A,SAXENA A. DeepMPC:Learning deep latent features for model predictive control [C]//Proceedings of the RSS 2015:Robotics:Science and Systems Conference,Rome:MIT Press,2015:10. [73] PUNJANI A,ABBEEL P. Deep learning helicopter dynamics models[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation (ICRA),Washington:IEEE,2015:3223-3230. [74] HUANG Y,YAPLE Z A,YU R. Goal-oriented and habitual decisions:Neural signatures of model-based and model-free learning[J]. NeuroImage,2020,215:116834. [75] HUYS Q J M,CRUICKSHANK A,SERIÈS P. Reward-based learning,model-based and model-free[M]. New York:Springer,2014. [76] DAW N D. Model-based reinforcement learning as cognitive search:Neurocomputational theories[J]. Cognitive Search:Evolution,Algorithms and the Brain,2012(1):195-208. [77] BOTVINICK M M,NIV Y,BARTO A G. Hierarchically organized behavior and its neural foundations:A reinforcement learning perspective[J]. Cognition,2009,113(3):262-280. [78] PATERIA S,SUBAGDJA B,TAN A,et al. Hierarchical reinforcement learning:A comprehensive survey[J]. ACM Computing Surveys (CSUR),2021,54(5):1-35. [79] NASIRIANY S,PONG V,LIN S,et al. Planning with goal-conditioned policies[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems,Vancouver:NeurIPS,2019:14843-14854. [80] LIU Minghuan,ZHU Menghui,ZHANG Weinan. Goal-conditioned reinforcement learning:Problems and solutions[J]. arXiv preprint arXiv:2201.08299,2022. [81] COLAS C,KARCH T,SIGAUD O,et al. Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning:A short survey[J]. Journal of Artificial Intelligence Research,2022,74:1159-1199. [82] ÇATAL O,VERBELEN T,NAUTA J,et al. Learning perception and planning with deep active inference[C]//Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),Barcelona:IEEE,2020:3952-3956. [83] SIMS M,PEZZULO G. Modelling ourselves:what the free energy principle reveals about our implicit notions of representation[J]. Synthese,2021,199(3):7801-7833. [84] FRISTON K,FITZGERALD T,RIGOLI F,et al. Active inference:A process theory[J]. Neural Computation,2017,29(1):1-49. [85] FRISTON K,FITZGERALD T,RIGOLI F,et al. Active inference and learning[J]. Neuroscience & Biobehavioral Reviews,2016,68:862-879. [86] FRISTON K. The free-energy principle:A unified brain theory?[J]. Nature reviews neuroscience,2010,11(2):127-138. [87] FRISTON K,SAMOTHRAKIS S,MONTAGUE R. Active inference and agency:Optimal control without cost functions[J]. Biological Cybernetics,2012,106(8):523-541. [88] FRISTON K,SCHWARTENBECK P,FITZGERALD T,et al. The anatomy of choice:Active inference and agency[J]. Frontiers in Human Neuroscience,2013,7:598. [89] DA COSTA L,LANILLOS P,SAJID N,et al. How active inference could help revolutionise robotics[J]. Entropy,2022,24(3):361. [90] TANIGUCHI T,NAGAI T,NAKAMURA T,et al. Symbol emergence in robotics:A survey[J]. Advanced Robotics,2016,30(11-12):706-728. [91] SAWAGUCHI T. Brain studies of symbol manipulation:focusing on neuronal studies[J]. Cognitive studies:Bulletin of the Japanese Cognitive Science Society,2000,7:189-194. [92] VERSCHURE P F M J,PENNARTZ C M A,PEZZULO G. The why,what,where,when and how of goal-directed choice:Neuronal and computational principles[J]. Philosophical Transactions of the Royal Society B:Biological Sciences,2014,369(1655):20130483. [93] CIMATTI A,PISTORE M,TRAVERSO P. Automated planning[J]. Foundations of Artificial Intelligence,2008,3:841-867. [94] KARPAS E,MAGAZZENI D. Automated planning for robotics[J]. Annual Review of Control,Robotics,and Autonomous Systems,2020,3:417-439. [95] FRANK J D. Artificial intelligence:Powering human exploration of the moon and mars[J/OL]. ArXiv. https://doi.org/10.48550/arXiv.1910.03014. [96] DAYAN P,HINTON G E,NEAL R M,et al. The helmholtz machine[J]. Neural Computation,1995,7(5):889-904. [97] SOLWAY A,BOTVINICK M M. Goal-directed decision making as probabilistic inference:A computational framework and potential neural correlates[J]. Psychological Review,2012,119(1):120. [98] PENNY W D,ZEIDMAN P,BURGESS N. Forward and backward inference in spatial cognition[J]. PLoS Computational Biology,2013,9(12):e1003383. [99] PEZZULO G,VAN DER MEER M A A,LANSINK C S,et al. Internally generated sequences in learning and executing goal-directed behavior[J]. Trends in Cognitive Sciences,2014,18(12):647-657. [100] GRIFFITHS T L,KEMP C,TENENBAUM J B. Bayesian models of cognition[M]. Cambridge:Cambridge University Press,2008. [101] BOTVINICK M,AN J. Goal-directed decision making in prefrontal cortex:A computational framework[J]. Advances in Neural Information Processing Systems,2008(1):21. [102] TOUSSAINT M,STORKEY A. Probabilistic inference for solving discrete and continuous state Markov decision processes[C]//Proceedings of the the 23rd International Conference on Machine Learning,New York:ACM,2006:945-952. [103] CHAO Z C,BAKKUM D J,POTTER S M. Shaping embodied neural networks for adaptive goal-directed behavior[J]. PLoS Computational Biology,2008,4(3):e1000042. [104] FRIEDRICH J,LENGYEL M. Goal-directed decision making with spiking neurons[J]. Journal of Neuroscience,2016,36(5):1529-1546. [105] RUECKERT E,KAPPEL D,TANNEBERG D,et al. Recurrent spiking networks solve planning tasks[J]. Scientific Reports,2016,6(1):1-10. [106] OTTE S,SCHMITT T,FRISTON K,et al. Inferring adaptive goal-directed behavior within recurrent neural networks[C]//Proceedings of the International Conference on Artificial Neural Networks,Alghero:Springer,2017:227-235. [107] TANNEBERG D,PARASCHOS A,PETERS J,et al. Deep spiking networks for model-based planning in humanoids[C]//Proceedings of 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids),Cancun:IEEE,2016:656-661. [108] RÜCKERT E A,NEUMANN G,TOUSSAINT M,et al. Learned graphical models for probabilistic planning provide a new class of movement primitives[J]. Frontiers in Computational Neuroscience,2013,6:97. [109] MATSUMOTO T,TANI J. Goal-directed planning for habituated agents by active inference using a variational recurrent neural network[J]. Entropy,2020,22(5):564. [110] MISYAK J B,CHATER N. Virtual bargaining:a theory of social decision-making[J]. Philosophical Transactions of the Royal Society B:Biological Sciences,2014,369(1655):20130487. [111] LAKE B M,ULLMAN T D,TENENBAUM J B,et al. Building machines that learn and think like people[J]. Behavioral and Brain Sciences,2017,40:e253. [112] CHAI Jie,RUAN Xiaogang,HUANG Jing. NLM-HS:Navigation learning model based on a hippocampal-striatal circuit for explaining navigation mechanisms in animal brains[J]. Brain Sciences,2021,11(6):803. [113] KUNZ L,WANG L,LACHNER-PIZA D,et al. Hippocampal theta phases organize the reactivation of large-scale electrophysiological representations during goal-directed navigation[J]. Science Advances,2019,5(7):eaav8192. [114] ITO H T,ZHANG S J,WITTER M P,et al. A prefrontal-thalamo-hippocampal circuit for goal-directed spatial navigation[J]. Nature,2015,522(7554):50-55. [115] PFEIFFER B E,FOSTER D J. Hippocampal place-cell sequences depict future paths to remembered goals[J]. Nature,2013,497(7447):74-79. [116] AOKI Y,IGATA H,IKEGAYA Y,et al. The integration of goal-directed signals onto spatial maps of hippocampal place cells[J]. Cell Reports,2019,27(5):1516-1527. [117] MILFORD M,SCHULZ R. Principles of goal-directed spatial robot navigation in biomimetic models[J]. Philosophical Transactions of the Royal Society B:Biological Sciences,2014,369(1655):20130484. [118] KAPLAN R,FRISTON K J. Planning and navigation as active inference[J]. Biological Cybernetics,2018,112(4):323-343. [119] ÇATAL O,VERBELEN T,VAN DE MAELE T,et al. Robot navigation as hierarchical active inference[J]. Neural Networks,2021,142:192-204. [120] ÇATAL O,WAUTHIER S,VERBELEN T,et al. Deep active inference for autonomous robot navigation[C]//8th International Conference on Learning Representations (ICLR),Addis Ababa:OpenReview.net,2020:1. [121] ÇATAL O,WAUTHIER S,DE BOOM C,et al. Learning generative state space models for active inference[J]. Frontiers in Computational Neuroscience,2020,14:574372. [122] TALBOT B,DAYOUB F,CORKE P,et al. Robot navigation in unseen spaces using an abstract map[J]. IEEE Transactions on Cognitive and Developmental Systems,2020,13:791-805. [123] TALBOT B,LAM O,SCHULZ R,et al. Find my office:Navigating real space from semantic descriptions[C]//Proceedings of 2016 IEEE International Conference on Robotics and Automation (ICRA),Stockholm:IEEE,2016:5782-5787. [124] SCHULZ R,TALBOT B,LAM O,et al. Robot navigation using human cues:A robot navigation system for symbolic goal-directed exploration[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation (ICRA),Seattle:IEEE,2015:1100-1105. [125] SCHULZ R,TALBOT B,UPCROFT B,et al. Constructing abstract maps from spatial descriptions for goal-directed exploration[C]//Proceedings of Robotics:Science and Systems,Rome:The MIT Press,2015. [126] ZHOU Ye,VAN KAMPEN E J,CHU Q. Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability[J]. Neurocomputing,2019,331:443-457. [127] YU Jinglun,SU Yuancheng,LIAO Yifan. The path planning of mobile robot by neural networks and hierarchical reinforcement learning[J]. Frontiers in Neurorobotics,2020,14:63. [128] DZHIVELIKIAN E,LATYSHEV A,KUDEROV P. et al. Hierarchical intrinsically motivated agent planning behavior with dreaming in grid environments[J]. Brain Inf.,2022,9(1):8. [129] 王文玺,肖世德,孟祥印,等. 基于Agent的递阶强化学习模型与体系结构[J]. 机械工程学报,2010,46(2):76-82. WANG Wenxi,XIAO Shide,MENG Xiangyin,et al. Model and architecture of hierarchical reinforcement learning based on agent[J]. Journal of Mechanical Engineering,2010,46(2):76-82. [130] SHAH D,EYSENBACH B,KAHN G,et al. Ving:Learning open-world navigation with visual goals[C]//Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA),Xi’an:IEEE,2021:13215-13222. [131] MEZGHANI L,SUKHBAATAR S,LAVRIL T,et al. Memory-augmented reinforcement learning for image-goal navigation[C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),IEEE,2022:3316-3323. [132] MANDERSON T,HIGUERA J C G,WAPNICK S,et al. Vision-based goal-conditioned policies for underwater navigation in the presence of obstacles[C]//14th Robotics:Science and Systems 2020,Corvalis:MIT Press,2020. [133] NOGUCHI W,IIZUKA H,YAMAMOTO M. Navigation behavior based on self-organized spatial representation in hierarchical recurrent neural network[J]. Advanced Robotics,2019,33(11):539-549. [134] MU Xiaokai,HE Bo,ZHANG Xin,et al. End-to-end navigation for autonomous underwater vehicle with hybrid recurrent neural networks[J]. Ocean Engineering,2019,194:106602. [135] CHRISTOPHER J,CUEVA A,WEI Xuexin,Emergence of grid-like representations by training recurrent neural networks to perform spatial localization[C]//6th International Conference on Learning Representations,Vancouver:OpenReview.net,2018. [136] BRAHMI H,AMMAR B,ALIMI A M. Intelligent path planning algorithm for autonomous robot based on recurrent neural networks[C]//Proceedings of 2013 International Conference on Advanced Logistics and Transport,Tunisia:IEEE,2013:199-204. [137] YANG Guangzhong,BELLINGHAM J,DUPONT P E,et al. The grand challenges of science robotics[J]. Science Robotics,2018,3(14):eaar7650. [138] BIANCO F,OGNIBENE D. From psychological intention recognition theories to adaptive theory of mind for robots:Computational models[C]//Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction,Cambridge:ACM,2020:136-138. [139] CSIBRA G,GERGELY G. ‘Obsessed with goals’:Functions and mechanisms of teleological interpretation of actions in humans[J]. Acta Psychologica,2007,124(1):60-78. [140] SOUTHGATE V,JOHNSON M H,CSIBRA G. Infants attribute goals even to biomechanically impossible actions[J]. Cognition,2008,107(3):1059-1069. [141] CANGELOSI A,SCHLESINGER M. From babies to robots:the contribution of developmental robotics to developmental psychology[J]. Child Development Perspectives,2018,12(3):183-188. [142] DEMIRIS Y,DEARDEN A. From motor babbling to hierarchical learning by imitation:a robot developmental pathway[C]//Proceedings of the Fifth International Workshop on Epigenetic Robotics:Modeling Cognitive Development in Robotic Systems,Lund:Lund University Cognitive Studies,2005,31-37. [143] BIANCO F,OGNIBENE D. Functional advantages of an adaptive theory of mind for robotics:A review of current architectures[C]//11th Computer Science and Electronic Engineering (CEEC),Colchester:IEEE,2019:139-143. [144] GÖRÜR O C,ROSMAN B S,HOFFMAN G,et al. Toward integrating theory of mind into adaptive decision-making of social robots to understand human intention[C]//Workshop on the Role of Intentions in Human-robot Interaction at the 12th ACM/IEEE International Conference on Human-robot Interaction,Vienna:ACM,2017:1. [145] LEMAIGNAN S,WARNIER M,SISBOT E A,et al. Artificial cognition for social human-robot interaction:An implementation[J]. Artificial Intelligence,2017,247:45-69. [146] CHAME H F,AHMADI A,TANI J. A hybrid human-neurorobotics approach to primary intersubjectivity via active inference[J]. Frontiers in Psychology,2020,11:584869. [147] SCHOELLER F,MILLER M,SALOMON R,et al. Trust as extended control:Human-machine interactions as active inference[J]. Frontiers in Systems Neuroscience,2021,15:93. [148] OHATA W,TANI J. Investigation of the sense of agency in social cognition,based on frameworks of predictive coding and active inference:A simulation study on multimodal imitative interaction[J]. Frontiers in Neurorobotics,2020,14:61. [149] CHAME H F,TANI J. Cognitive and motor compliance in intentional human-robot interaction[C]//Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE,2020:11291-11297. [150] PENG Baolin,GALLEY M,HE Pengcheng,et al. Godel:Large-scale pre-training for goal-directed dialog[J]. arXiv preprint arXiv:2206.11309,2022. [151] HAM D,LEE J G,Jang Y,et al. End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics,2020:583-592. [152] PLAPPERT M,ANDRYCHOWICZ M,RAY A,et al. Multi-goal reinforcement learning:Challenging robotics environments and request for research[J]. arXiv preprint arXiv:1802.09464,2018. [153] ZHU Menghui,LIU Minghuan,SHEN Jian,et al. Mapgo:Model-assisted policy optimization for goal-oriented tasks[C]//The 30th International Joint Conference on Artificial Intelligence (IJCAI),2021:1. [154] KHAZATSKY A,NAIR A,JING D,et al. What can I do here? learning new skills by imagining visual affordances[C]//Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA),Xi’an:IEEE,2021:14291-14297. [155] BROCKMAN G,CHEUNG V,PETTERSSON L,et al. Openai gym[J]. arXiv preprint arXiv:1606.01540,2016. [156] JOHNSON M,HOFMANN K,HUTTON T,et al. The Malmo platform for artificial intelligence experimentation[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence,New York:ACM,2016:4246-4247. [157] NAIR A V,PONG V,DALAL M,et al. Visual reinforcement learning with imagined goals[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems,Montréal:Curran Associates Inc.,2018:9209-9220. [158] NAIR S,SAVARESE S,FINN C. Goal-aware prediction:Learning to model what matters[C]//Proceedings of the 37th International Conference on Machine Learning,Vienna:PMLR,2020:7207-7219. [159] PEZZATO C,FERRARI R,CORBATO C H. A novel adaptive controller for robot manipulators based on active inference[J]. IEEE Robotics and Automation Letters,2020,5(2):2973-2980. [160] PEZZULO G,RIGOLI F,FRISTON K. Active inference,homeostatic regulation and adaptive behavioural control[J]. Progress in Neurobiology,2015,134:17-35. [161] PIO-LOPEZ L,NIZARD A,FRISTON K,et al. Active inference and robot control:A case study[J]. Journal of The Royal Society Interface,2016,13(122):20160616. [162] SCHNEIDER T,BELOUSOV B,ABDULSAMAD H,et al. Active inference for robotic manipulation[J]. arXiv preprint arXiv:2206.10313,2022. [163] ZENG Zhen,ZHOU Zheming,SUI Zhiqiang,et al. Semantic robot programming for goal-directed manipulation in cluttered scenes[C]//Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA),Prague:IEEE,2018:7462-7469. [164] SUI Zhiqiang,JENKINS O C,DESINGH K. Axiomatic particle filtering for goal-directed robotic manipulation[C]//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),Hamburg:IEEE,2015:4429-4436. [165] SUI Zhiqiang,ZHOU Zheming,ZENG Zhen,et al. Sum:Sequential scene understanding and manipulation[C]//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),Vancouver:IEEE,2017:3281-3288. [166] SUI Zhiqiang,XIANG Lingzhu,JENKINS O C,et al. Goal-directed robot manipulation through axiomatic scene estimation[J]. The International Journal of Robotics Research,2017,36(1):86-104. [167] CHEN Xiaotong,CHEN Rui,SUI Zhiqiang,et al. Grip:Generative robust inference and perception for semantic robot manipulation in adversarial environments[C]//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),Macau:IEEE,2019:3988-3995. [168] BRAUD R,PITTI A,GAUSSIER P. Dynamic sensorimotor model for open-ended acquisition of tool-use[C]//Proceedings of 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob),Paris:IEEE,2016:286-287. [169] REINHART R F,STEIL J J. Reaching movement generation with a recurrent neural network based on learning inverse kinematics for the humanoid robot iCub[C]//Proceedings of the 9th IEEE-RAS International Conference on Humanoid Robots,Paris:IEEE,2009:323-330. [170] REINHART R F,STEIL J J. Goal-directed movement generation with a transient-based recurrent neural network controller[C]//Proceedings of 2009 Advanced Technologies for Enhanced Quality of Life,Iasi:IEEE,2009:112-117. [171] ZHONG Shanlin,ZHOU Junjie,QIAO Hong. Bioinspired gain-modulated recurrent neural network for controlling musculoskeletal robot[J]. IEEE Transactions on Neural Networks and Learning Systems,2021(1):1-15. [172] HUANG Xiao,WU Wei,QIAO Hong,et al. Brain-inspired motion learning in recurrent neural network with emotion modulation[J]. IEEE Transactions on Cognitive and Developmental Systems,2018,10(4):1153-1164. [173] MORIMOTO J,DOYA K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning[J]. Robotics and Autonomous Systems,2001,36(1):37-51. [174] YANG Xintong,JI Ze,WU Jing,et al. Hierarchical reinforcement learning with universal policies for multistep robotic manipulation[J]. IEEE Transactions on Neural Networks and Learning Systems,2021,33(9):4727-4741. [175] HOU Zhimin,FEI Jiajun,DENG Yuelin,et al. Data-efficient hierarchical reinforcement learning for robotic assembly control applications[J]. IEEE Transactions on Industrial Electronics,2020,68(11):11565-11575. [176] CHENG Xiaobei,SHEN Jing,LIU Haibo,et al. Multi-robot cooperation based on hierarchical reinforcement learning[C]//Proceedings of the International Conference on Computational Science 2007,Beijing:Springer,2007:90-97. [177] MAKAR R,MAHADEVAN S,GHAVAMZADEH M. Hierarchical multi-agent reinforcement learning[C]//Proceedings of the Fifth International Conference on Autonomous Agents,Montreal:Association for Computing Machinery,2001:246-253. [178] LE BARS S,CHOKRON S,BALP R,et al. Theoretical perspective on an ideomotor brain-computer interface:Toward a naturalistic and non-invasive brain-computer interface paradigm based on action-effect representation[J]. Frontiers in Human Neuroscience,2021,15:732-764. [179] MOKHTARI V,LOPES L S,PINHO A J. Experience-based robot task learning and planning with goal inference[C]//Proceedings of the Twenty-Sixth International Conference on Automated Planning and Scheduling,London:AAAI,2016:509-517. [180] SANTUCCI V G,BALDASSARRE G,MIROLLI M. GRAIL:A goal-discovering robotic architecture for intrinsically-motivated learning[J]. IEEE Transactions on Cognitive and Developmental Systems,2016,8(3):214-231. [181] KOTSERUBA I,TSOTSOS J K. 40 years of cognitive architectures:Core cognitive abilities and practical applications[J]. Artificial Intelligence Review,2020,53(1):17-94. |
[1] | 秦光林, 崔长彩, 尹方辰, 黄辉. 面向复杂立体石雕的机器人面扫描视点规划[J]. 机械工程学报, 2024, 60(8): 22-33. |
[2] | 张行, 富宽, 陈铭浩, 李睿, 石新娜. 压差式多节串联管道机器人越障时动力学演化规律及减振分析[J]. 机械工程学报, 2024, 60(8): 348-359. |
[3] | 徐丰羽, 马凯威, 宋巨龙, 范保杰, 武新军. 基于弹簧-磁流变阻尼器的拉索攀爬机器人减振机构及控制方法[J]. 机械工程学报, 2024, 60(8): 384-395. |
[4] | 邓建新, 袁邦颐, 黄秋林, 丁度坤, 辛曼玉, 刘光明. 基于工业机器人的复杂曲面磨抛关键技术综述[J]. 机械工程学报, 2024, 60(7): 1-21. |
[5] | 商德勇, 黄欣怡, 黄云山, 张天佑. 基于Kane方程的Delta并联机器人刚柔耦合动力学研究[J]. 机械工程学报, 2024, 60(7): 124-133. |
[6] | 张立元, 杨锦波, 李澳, 杨庆凯, 徐光魁. 张拉整体球形机器人构型设计与控制研究进展[J]. 机械工程学报, 2024, 60(5): 1-18. |
[7] | 刘达新, 王科, 刘振宇, 许嘉通, 谭建荣. 基于数据融合与知识推理的机器人装配单元数字孪生建模方法研究[J]. 机械工程学报, 2024, 60(5): 36-50. |
[8] | 商德勇, 黄云山, 黄欣怡, 潘崭. 基于奇异摄动的刚柔耦合Delta机器人非线性混合控制[J]. 机械工程学报, 2024, 60(5): 95-106. |
[9] | 周祖意, 杨玉维, 齐文耀, 龚健超, 李照童. 腰部外骨骼机器人多刚-柔体动力学等效逆解方法研究及其性能优化综合[J]. 机械工程学报, 2024, 60(5): 107-118. |
[10] | 孟原, 史宝军, 张德权. Kriging-高维代理模型建模方法研究与改进[J]. 机械工程学报, 2024, 60(5): 249-263. |
[11] | 纵怀志, 艾吉昆, 张军辉, 江磊, 谭树杰, 刘余贤, 苏琦, 徐兵. 基于拓扑优化和晶格填充的四足机器人肢腿单元轻量化设计[J]. 机械工程学报, 2024, 60(4): 420-429. |
[12] | 赵亮亮, 刘子毅, 赵京东, 段启帆, 王梓睿, 刘宏. 面向在轨建造的空间机器人末端执行器及适配接口设计与验证[J]. 机械工程学报, 2024, 60(3): 1-10. |
[13] | 刘晓飞, 万波, 王雨, 李鸣宇, 赵永生. 新型超冗余驱动混联机器人设计分析与性能优化[J]. 机械工程学报, 2024, 60(3): 55-67. |
[14] | 孙广开, 张兴硕, 何彦霖, 周康鹏, 祝连庆. 面向连续体机器人精密操作的多芯光纤三维形状与位置测量误差研究[J]. 机械工程学报, 2024, 60(3): 68-82. |
[15] | 张伟民, 徐森生, 张月. 基于改进A*算法的室内巡检机器人路径规划研究[J]. 机械工程学报, 2024, 60(20): 315-326. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||