Review of the Application of Goal-directed Cognitive Mechanism in Robotics

doi:10.3901/JME.2023.23.001

Abstract

Abstract: People are trying to summarize the cognitive mechanism from the performance of humans in processing complex tasks and apply it to the robot to replace the human in processing complex tasks. In human-computer interaction, techniques have been proposed to teach robots new skills through programming demonstrations and imitation learning. A significant remaining challenge is to allow robots to infer human intentions and learn new skills through goal-based imitation rather than following demonstrated motion trajectories (track-based or motion imitation). Unlike the cognitive mechanism of habitual behavior (input-operation mapping), the goal-based approach first deduces the goals of the behavior and then generates a plan to achieve those goals. The existence of goals is the key to forming higher human cognitive ability. It has essential reference significance in guiding robots to learn human skills for handling complex tasks. Firstly, this research introduces the critical role of goal-oriented behavior in the process of advanced human cognition from the relevant research in the field of cognitive science and introduces the computing framework that embodies goal-oriented behavior from the three paradigms of artificial intelligence, behaviorism, symbolism, and connectionism. Then the application of goal-directed cognitive basis in bionic robot navigation, human-computer interaction intention recognition, and robot skill learning are introduced. Finally, the relevant research projects in recent years are introduced, and some suggestions for future development are given.

Key words: robot, imitation learning, goal-directed, cognitive

CLC Number:

TG156

CONG Ming, LI Jinzhong, LIU Dong, DU Yu. Review of the Application of Goal-directed Cognitive Mechanism in Robotics[J]. Journal of Mechanical Engineering, 2023, 59(23): 1-22.

References

[1] ABBEEL P，NG A Y. Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the 21th International Conference on Machine Learning，New York:ACM，2004:1.
[2] HO J，ERMON S. Generative adversarial imitation learning[J]. Advances in Neural Information Processing Systems，2016(1):4565-4573.
[3] SILVER D，HUBERT T，SCHRITTWIESER J，et al. A general reinforcement learning algorithm that masters chess，shogi，and go through self-play[J]. Science，2018，362(6419):1140-1144.
[4] SCHRITTWIESER J，ANTONOGLOU I，HUBERT T，et al. Mastering atari，go，chess and shogi by planning with a learned model[J]. Nature，2020，588(7839):604-609.
[5] LEVINE S，FINN C，DARRELL T，et al. End-to-end training of deep visuomotor policies[J]. The Journal of Machine Learning Research，2016，17(1):1334-1373.
[6] PINTO L，GUPTA A. Supersizing self-supervision:Learning to grasp from 50k tries and 700 robot hours[C]//Proceedings of the 2016 IEEE international conference on robotics and automation (ICRA)，Stockholm:IEEE，2016:3406-3413.
[7] FINN C，ABBEEL P，LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the International Conference on Machine Learning，Sydney:PMLR，2017:1126-1135.
[8] MISHRA N，ROHANINEJAD M，CHEN Xi，et al. A simple neural attentive meta-learner[C]//Proceedings of the 6th International Conference on Learning Representations (ICLR)，Vancouver:OpenReview.net，2018:1.
[9] SONG Xingyou，YANG Yuxiang，CHOROMANSKI K，et al. Rapidly adaptable legged robots via evolutionary meta-learning[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)，Las Vegas:IEEE，2020:3769-3776.
[10] ZHANG Tianhao，MCCARTHY Z，JOW O，et al. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA)，Brisbane:IEEE，2018:5628-5635.
[11] WU Yueh-Hua，CHAROENPHAKDEE N，BAO Han，et al. Imitation learning from imperfect demonstration[C]//Proceedings of the International Conference on Machine Learning，California:PMLR，2019:6818-6827.
[12] BALAKRISHNA A，THANANJEYAN B，LEE J，et al. On-policy robot imitation learning from a converging supervisor[C]//Proceedings of the Conference on Robot Learning，PMLR，2020:24-41.
[13] PENG Xuebin，ANDRYCHOWICZ M，ZAREMBA W，et al. Sim-to-real transfer of robotic control with dynamics randomization[C]//2018 IEEE International Conference on Robotics and Automation(ICRA)，Brisbane:IEEE，2018:3803-3810.
[14] YANG Jiachen，PETERSEN B，ZHA Hongyuan，et al. Single episode policy transfer in reinforcement learning[C]//8th International Conference on Learning Representations (ICLR)，Addis Ababa:OpenReview.net，2020:1.
[15] HOUTHOOFT R，CHEN Y，ISOLA P，et al. Evolved policy gradients[J]. Advances in Neural Information Processing Systems，2018(1):5400-5409.
[16] WANG Haozhe，ZHOU Jiale，HE Xuming. Learning context-aware task reasoning for efficient meta reinforcement learning[C]//Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems，Auckland:International Foundation for Autonomous Agents and Multiagent Systems，2020:1440-1448.
[17] RAKELLY K，ZHOU A，FINN C，et al. Efficient off-policy meta-reinforcement learning via probabilistic context variables[C]//Proceedings of the International Conference on Machine Learning，California:PMLR，2019:5331-5340.
[18] GLEISSNER B，BEKKERING H，MELTZOFF A N. Children’s coding of human action:Cognitive factors influencing imitation in 3 year old[J]. Developmental Science，2000，3(4):405-414.
[19] RUMIATI R I，BEKKERING H. To imitate or not to imitate? How the brain can do it，that is the question![J]. Brain and Cognition，2003，53(3):479-482.
[20] MELTZOFF A N，KUHL P K，MOVELLAN J，et al. Foundations for a new science of learning[J]. Science，2009，325(5938):284-288.
[21] SHON AP，STORZ J J，RAO R P N. Towards a real-time bayesian imitation system for a humanoid robot[C]//Proceedings of the 2007 IEEE International Conference on Robotics and Automation，Roma:IEEE，2007:2847-2852.
[22] HUANG D A，CHAO Y W，PAXTON C，et al. Motion reasoning for goal-based imitation learning[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE，2020:4878-4884.
[23] WOOD W，RÜNGER D. Psychology of habit[J]. Annual Review of Psychology，2016，67:289-314.
[24] DOLAN R J，DAYAN P. Goals and habits in the brain[J]. Neuron，2013，80(2):312-325.
[25] CHUNG M J Y，FRIESEN A L，FOX D，et al. A Bayesian developmental approach to robotic goal-based imitation learning[J]. PloS One，2015，10(11):e0141965.
[26] BALLEINE B W，DEZFOULI A，ITO M，et al. Hierarchical control of goal-directed action in the cortical–basal ganglia network[J]. Current Opinion in Behavioral Sciences，2015，5:1-7.
[27] MANNELLA F，GURNEY K，BALDASSARRE G. The nucleus accumbens as a nexus between values and goals in goal-directed behavior:A review and a new hypothesis[J]. Frontiers in Behavioral Neuroscience，2013，7:135.
[28] YIN H H，OSTLUND S B，KNOWLTON B J，et al. The role of the dorsomedial striatum in instrumental conditioning[J]. European Journal of Neuroscience，2005，22(2):513-523.
[29] BROVELLI A，NAZARIAN B，MEUNIER M，et al. Differential roles of caudate nucleus and putamen during instrumental learning[J]. Neuroimage，2011，57(4):1580-1590.
[30] JAHANSHAHI M，OBESO I，ROTHWELL J C，et al. A fronto–striato–subthalamic–pallidal network for goal-directed and habitual inhibition[J]. Nature Reviews Neuroscience，2015，16(12):719-732.
[31] CALIGIORE D，ARBIB M A，MIALL R C，et al. The super-learning hypothesis:Integrating learning processes across cortex，cerebellum and basal ganglia[J]. Neuroscience & Biobehavioral Reviews，2019，100:19-34.
[32] THILL S，CALIGIORE D，BORGHI A M，et al. Theories and computational models of affordance and mirror systems:An integrative review[J]. Neuroscience & Biobehavioral Reviews，2013，37(3):491-521.
[33] VAN-HORENBEKE F A，PEER A. Activity，plan，and goal recognition:A review[J]. Frontiers in Robotics and AI，2021，8:643010.
[34] VERNON D，VON HOFSTEN C，FADIGA L. A roadmap for cognitive development in humanoid robots[M]. Springer Science & Business Media，2011.
[35] VERNON D，BEETZ M，SANDINI G. Prospection in cognition:The case for joint episodic-procedural memory in cognitive robotics[J]. Frontiers in Robotics and AI，2015，2:19.
[36] ZIAFAR M，NAMAZIANDOST E. From behaviorism to new behaviorism:A review study[J]. Loquen:English Studies Journal，2019，12(2):109-116.
[37] ASCHERSLEBEN G，HOFER T，JOVANOVIC B. The link between infant attention to goal-directed action and later theory of mind abilities[J]. Dev Sci，2008，11(6):862-868.
[38] CHEN Boyuan，VONDRICK C，LIPSON H. Visual behavior modelling for robotic theory of mind[J]. Scientific Reports，2021，11(1):1-14.
[39] TRAFTON J G，CASSIMATIS N L，BUGAJSKA M D，et al. Enabling effective human-robot interaction using perspective-taking in robots[J]. IEEE Transactions on Systems，Man，and Cybernetics-Part A:Systems and Humans，2005，35(4):460-470.
[40] BERLIN M，GRAY J，THOMAZ A L，et al. Perspective taking:An organizing principle for learning in human-robot interaction[C]//Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference，Massachusetts:AAAI，2006:1444-1450.
[41] MILLIEZ G，WARNIER M，CLODIC A，et al. A framework for endowing an interactive robot with reasoning capabilities about perspective-taking and belief management[C]//Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication，Edinburgh:IEEE，2014:1103-1109.
[42] DEVIN S，ALAMI R. An implemented theory of mind to improve human-robot shared plans execution[C]//Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI)，Christchurch:IEEE，2016:319-326.
[43] KWISTHOUT J. What can the PGM community contribute to the ‘Bayesian Brain’hypothesis?[J]. PGM，2018(1):37.
[44] STOCK A，STOCK C. A short history of ideo-motor action[J]. Psychological Research，2004，68(2):176-188.
[45] VERSCHOOR S，WEIDEMA M，BIRO S，et al. Where do action goals come from? Evidence for spontaneous action–effect binding in infants[J]. Frontiers in Psychology，2010，1:201.
[46] HOMMEL B. GOALIATH:A theory of goal-directed behavior[J]. Psychological Research，2022，86(4):1054-1077.
[47] BRATMAN M E. Intention and personal policies[J]. Philosophical Perspectives，1989，3:443-469.
[48] TOMASELLO M，CARPENTER M，CALL J，et al. Understanding and sharing intentions:The origins of cultural cognition[J]. Behavioral and Brain Sciences，2005，28(5):675-691.
[49] ONDOBAKA S，BEKKERING H. Hierarchy of idea-guided action and perception-guided movement[J]. Frontiers in Psychology，2012，3:579.
[50] SHIN Y K，PROCTOR R W，CAPALDI E J. A review of contemporary ideomotor theory[J]. Psychological Bulletin，2010，136(6):943.
[51] HOMMEL B. Action control according to TEC (theory of event coding)[J]. Psychological Research PRPF，2009，73(4):512-526.
[52] VERNON D. Bridging ideomotor theory and autonomous development with perceptuo-motor memory[J]. System，2015，2:F₂.
[53] VON HOFSTEN C. An action perspective on motor development[J]. Trends in Cognitive Sciences，2004，8(6):266-272.
[54] SELIGMAN M E P，RAILTON P，BAUMEISTER R F，et al. Navigating into the future or driven by the past[J]. Perspectives on Psychological Science，2013，8(2):119-141.
[55] VERNON D. Artificial cognitive systems:A primer[M]. Cambridge:MIT Press，2014.
[56] COUGHLIN C，LYONS K E，GHETTI S. Remembering the past to envision the future in middle childhood:Developmental linkages between prospection and episodic memory[J]. Cognitive Development，2014，30:96-110.
[57] DAW N D，DAYAN P. The algorithmic anatomy of model-based evaluation[J]. Philosophical Transactions of the Royal Society B:Biological Sciences，2014，369(1655):20130478.
[58] BOTVINICK M，WEINSTEIN A. Model-based hierarchical reinforcement learning and human action control[J]. Philosophical Transactions of the Royal Society B:Biological Sciences，2014，369(1655):20130480.
[59] FRISTON K，SCHWARTENBECK P，FITZGERALD T，et al. The anatomy of choice:Dopamine and decision-making[J]. Philosophical Transactions of the Royal Society B:Biological Sciences，2014，369(1655):20130481.
[60] KACHERGIS G，WYATTE D，O'REILLY R C，et al. A continuous-time neural model for sequential action[J]. Philosophical Transactions of the Royal Society B:Biological Sciences，2014，369(1655):20130623.
[61] BAKER C L，TENENBAUM J B，SAXE R R. Bayesian models of human action understanding[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems，Vancouver:MIT Press，2005:99-106.
[62] BASANISI R，BROVELLI A，CARTONI E，et al. A generative spiking neural-network model of goal-directed behaviour and one-step planning[J]. PLOS Computational Biology，2020，16(12):e1007579.
[63] DOYA K，ISHII S，POUGET A，et al. Bayesian brain:Probabilistic approaches to neural coding[M]. Cambridge:MIT Press，2007.
[64] BOTVINICK M，TOUSSAINT M. Planning as inference[J]. Trends in Cognitive Sciences，2012，16(10):485-488.
[65] KAPPEN H J，GÓMEZ V，OPPER M. Optimal control as a graphical model inference problem[J]. Machine Learning，2012，87(2):159-182.
[66] HOMMEL B，MÜSSELER J，ASCHERSLEBEN G，et al. The theory of event coding (TEC):A framework for perception and action planning[J]. Behavioral and Brain Sciences，2001，24(5):849-878.
[67] HOMMEL B. Theory of event coding (TEC) V2.0:Representing and controlling perception and action[J]. Attention，Perception，& Psychophysics，2019，81(7):2139-2154.
[68] HOMMEL B. Binary theorizing does not account for action control[J]. Frontiers in Psychology，2019，10:2542.
[69] BERNHARD H. The theory of event coding (TEC) as embodied-cognition framework[J]. Frontiers in Psychology，2015，6:1318.
[70] GLÄSCHER J，DAW N，DAYAN P，et al. States versus rewards:Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning[J]. Neuron，2010，66(4):585-595.
[71] BALLEINE B W，O'DOHERTY J P. Human and rodent homologies in action control:Corticostriatal determinants of goal-directed and habitual action[J]. Neuropsychopharmacology，2010，35(1):48-69.
[72] LENZ I，KNEPPER R A，SAXENA A. DeepMPC:Learning deep latent features for model predictive control [C]//Proceedings of the RSS 2015:Robotics:Science and Systems Conference，Rome:MIT Press，2015:10.
[73] PUNJANI A，ABBEEL P. Deep learning helicopter dynamics models[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation (ICRA)，Washington:IEEE，2015:3223-3230.
[74] HUANG Y，YAPLE Z A，YU R. Goal-oriented and habitual decisions:Neural signatures of model-based and model-free learning[J]. NeuroImage，2020，215:116834.
[75] HUYS Q J M，CRUICKSHANK A，SERIÈS P. Reward-based learning，model-based and model-free[M]. New York:Springer，2014.
[76] DAW N D. Model-based reinforcement learning as cognitive search:Neurocomputational theories[J]. Cognitive Search:Evolution，Algorithms and the Brain，2012(1):195-208.
[77] BOTVINICK M M，NIV Y，BARTO A G. Hierarchically organized behavior and its neural foundations:A reinforcement learning perspective[J]. Cognition，2009，113(3):262-280.
[78] PATERIA S，SUBAGDJA B，TAN A，et al. Hierarchical reinforcement learning:A comprehensive survey[J]. ACM Computing Surveys (CSUR)，2021，54(5):1-35.
[79] NASIRIANY S，PONG V，LIN S，et al. Planning with goal-conditioned policies[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems，Vancouver:NeurIPS，2019:14843-14854.
[80] LIU Minghuan，ZHU Menghui，ZHANG Weinan. Goal-conditioned reinforcement learning:Problems and solutions[J]. arXiv preprint arXiv:2201.08299，2022.
[81] COLAS C，KARCH T，SIGAUD O，et al. Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning:A short survey[J]. Journal of Artificial Intelligence Research，2022，74:1159-1199.
[82] ÇATAL O，VERBELEN T，NAUTA J，et al. Learning perception and planning with deep active inference[C]//Proceedings of 2020 IEEE International Conference on Acoustics，Speech and Signal Processing (ICASSP)，Barcelona:IEEE，2020:3952-3956.
[83] SIMS M，PEZZULO G. Modelling ourselves:what the free energy principle reveals about our implicit notions of representation[J]. Synthese，2021，199(3):7801-7833.
[84] FRISTON K，FITZGERALD T，RIGOLI F，et al. Active inference:A process theory[J]. Neural Computation，2017，29(1):1-49.
[85] FRISTON K，FITZGERALD T，RIGOLI F，et al. Active inference and learning[J]. Neuroscience & Biobehavioral Reviews，2016，68:862-879.
[86] FRISTON K. The free-energy principle:A unified brain theory?[J]. Nature reviews neuroscience，2010，11(2):127-138.
[87] FRISTON K，SAMOTHRAKIS S，MONTAGUE R. Active inference and agency:Optimal control without cost functions[J]. Biological Cybernetics，2012，106(8):523-541.
[88] FRISTON K，SCHWARTENBECK P，FITZGERALD T，et al. The anatomy of choice:Active inference and agency[J]. Frontiers in Human Neuroscience，2013，7:598.
[89] DA COSTA L，LANILLOS P，SAJID N，et al. How active inference could help revolutionise robotics[J]. Entropy，2022，24(3):361.
[90] TANIGUCHI T，NAGAI T，NAKAMURA T，et al. Symbol emergence in robotics:A survey[J]. Advanced Robotics，2016，30(11-12):706-728.
[91] SAWAGUCHI T. Brain studies of symbol manipulation:focusing on neuronal studies[J]. Cognitive studies:Bulletin of the Japanese Cognitive Science Society，2000，7:189-194.
[92] VERSCHURE P F M J，PENNARTZ C M A，PEZZULO G. The why，what，where，when and how of goal-directed choice:Neuronal and computational principles[J]. Philosophical Transactions of the Royal Society B:Biological Sciences，2014，369(1655):20130483.
[93] CIMATTI A，PISTORE M，TRAVERSO P. Automated planning[J]. Foundations of Artificial Intelligence，2008，3:841-867.
[94] KARPAS E，MAGAZZENI D. Automated planning for robotics[J]. Annual Review of Control，Robotics，and Autonomous Systems，2020，3:417-439.
[95] FRANK J D. Artificial intelligence:Powering human exploration of the moon and mars[J/OL]. ArXiv. https://doi.org/10.48550/arXiv.1910.03014.
[96] DAYAN P，HINTON G E，NEAL R M，et al. The helmholtz machine[J]. Neural Computation，1995，7(5):889-904.
[97] SOLWAY A，BOTVINICK M M. Goal-directed decision making as probabilistic inference:A computational framework and potential neural correlates[J]. Psychological Review，2012，119(1):120.
[98] PENNY W D，ZEIDMAN P，BURGESS N. Forward and backward inference in spatial cognition[J]. PLoS Computational Biology，2013，9(12):e1003383.
[99] PEZZULO G，VAN DER MEER M A A，LANSINK C S，et al. Internally generated sequences in learning and executing goal-directed behavior[J]. Trends in Cognitive Sciences，2014，18(12):647-657.
[100] GRIFFITHS T L，KEMP C，TENENBAUM J B. Bayesian models of cognition[M]. Cambridge:Cambridge University Press，2008.
[101] BOTVINICK M，AN J. Goal-directed decision making in prefrontal cortex:A computational framework[J]. Advances in Neural Information Processing Systems，2008(1):21.
[102] TOUSSAINT M，STORKEY A. Probabilistic inference for solving discrete and continuous state Markov decision processes[C]//Proceedings of the the 23rd International Conference on Machine Learning，New York:ACM，2006:945-952.
[103] CHAO Z C，BAKKUM D J，POTTER S M. Shaping embodied neural networks for adaptive goal-directed behavior[J]. PLoS Computational Biology，2008，4(3):e1000042.
[104] FRIEDRICH J，LENGYEL M. Goal-directed decision making with spiking neurons[J]. Journal of Neuroscience，2016，36(5):1529-1546.
[105] RUECKERT E，KAPPEL D，TANNEBERG D，et al. Recurrent spiking networks solve planning tasks[J]. Scientific Reports，2016，6(1):1-10.
[106] OTTE S，SCHMITT T，FRISTON K，et al. Inferring adaptive goal-directed behavior within recurrent neural networks[C]//Proceedings of the International Conference on Artificial Neural Networks，Alghero:Springer，2017:227-235.
[107] TANNEBERG D，PARASCHOS A，PETERS J，et al. Deep spiking networks for model-based planning in humanoids[C]//Proceedings of 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)，Cancun:IEEE，2016:656-661.
[108] RÜCKERT E A，NEUMANN G，TOUSSAINT M，et al. Learned graphical models for probabilistic planning provide a new class of movement primitives[J]. Frontiers in Computational Neuroscience，2013，6:97.
[109] MATSUMOTO T，TANI J. Goal-directed planning for habituated agents by active inference using a variational recurrent neural network[J]. Entropy，2020，22(5):564.
[110] MISYAK J B，CHATER N. Virtual bargaining:a theory of social decision-making[J]. Philosophical Transactions of the Royal Society B:Biological Sciences，2014，369(1655):20130487.
[111] LAKE B M，ULLMAN T D，TENENBAUM J B，et al. Building machines that learn and think like people[J]. Behavioral and Brain Sciences，2017，40:e253.
[112] CHAI Jie，RUAN Xiaogang，HUANG Jing. NLM-HS:Navigation learning model based on a hippocampal-striatal circuit for explaining navigation mechanisms in animal brains[J]. Brain Sciences，2021，11(6):803.
[113] KUNZ L，WANG L，LACHNER-PIZA D，et al. Hippocampal theta phases organize the reactivation of large-scale electrophysiological representations during goal-directed navigation[J]. Science Advances，2019，5(7):eaav8192.
[114] ITO H T，ZHANG S J，WITTER M P，et al. A prefrontal-thalamo-hippocampal circuit for goal-directed spatial navigation[J]. Nature，2015，522(7554):50-55.
[115] PFEIFFER B E，FOSTER D J. Hippocampal place-cell sequences depict future paths to remembered goals[J]. Nature，2013，497(7447):74-79.
[116] AOKI Y，IGATA H，IKEGAYA Y，et al. The integration of goal-directed signals onto spatial maps of hippocampal place cells[J]. Cell Reports，2019，27(5):1516-1527.
[117] MILFORD M，SCHULZ R. Principles of goal-directed spatial robot navigation in biomimetic models[J]. Philosophical Transactions of the Royal Society B:Biological Sciences，2014，369(1655):20130484.
[118] KAPLAN R，FRISTON K J. Planning and navigation as active inference[J]. Biological Cybernetics，2018，112(4):323-343.
[119] ÇATAL O，VERBELEN T，VAN DE MAELE T，et al. Robot navigation as hierarchical active inference[J]. Neural Networks，2021，142:192-204.
[120] ÇATAL O，WAUTHIER S，VERBELEN T，et al. Deep active inference for autonomous robot navigation[C]//8th International Conference on Learning Representations (ICLR)，Addis Ababa:OpenReview.net，2020:1.
[121] ÇATAL O，WAUTHIER S，DE BOOM C，et al. Learning generative state space models for active inference[J]. Frontiers in Computational Neuroscience，2020，14:574372.
[122] TALBOT B，DAYOUB F，CORKE P，et al. Robot navigation in unseen spaces using an abstract map[J]. IEEE Transactions on Cognitive and Developmental Systems，2020，13:791-805.
[123] TALBOT B，LAM O，SCHULZ R，et al. Find my office:Navigating real space from semantic descriptions[C]//Proceedings of 2016 IEEE International Conference on Robotics and Automation (ICRA)，Stockholm:IEEE，2016:5782-5787.
[124] SCHULZ R，TALBOT B，LAM O，et al. Robot navigation using human cues:A robot navigation system for symbolic goal-directed exploration[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation (ICRA)，Seattle:IEEE，2015:1100-1105.
[125] SCHULZ R，TALBOT B，UPCROFT B，et al. Constructing abstract maps from spatial descriptions for goal-directed exploration[C]//Proceedings of Robotics:Science and Systems，Rome:The MIT Press，2015.
[126] ZHOU Ye，VAN KAMPEN E J，CHU Q. Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability[J]. Neurocomputing，2019，331:443-457.
[127] YU Jinglun，SU Yuancheng，LIAO Yifan. The path planning of mobile robot by neural networks and hierarchical reinforcement learning[J]. Frontiers in Neurorobotics，2020，14:63.
[128] DZHIVELIKIAN E，LATYSHEV A，KUDEROV P. et al. Hierarchical intrinsically motivated agent planning behavior with dreaming in grid environments[J]. Brain Inf.，2022，9(1):8.
[129] 王文玺，肖世德，孟祥印，等. 基于Agent的递阶强化学习模型与体系结构[J]. 机械工程学报，2010，46(2):76-82. WANG Wenxi，XIAO Shide，MENG Xiangyin，et al. Model and architecture of hierarchical reinforcement learning based on agent[J]. Journal of Mechanical Engineering，2010，46(2):76-82.
[130] SHAH D，EYSENBACH B，KAHN G，et al. Ving:Learning open-world navigation with visual goals[C]//Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA)，Xi’an:IEEE，2021:13215-13222.
[131] MEZGHANI L，SUKHBAATAR S，LAVRIL T，et al. Memory-augmented reinforcement learning for image-goal navigation[C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)，IEEE，2022:3316-3323.
[132] MANDERSON T，HIGUERA J C G，WAPNICK S，et al. Vision-based goal-conditioned policies for underwater navigation in the presence of obstacles[C]//14th Robotics:Science and Systems 2020，Corvalis:MIT Press，2020.
[133] NOGUCHI W，IIZUKA H，YAMAMOTO M. Navigation behavior based on self-organized spatial representation in hierarchical recurrent neural network[J]. Advanced Robotics，2019，33(11):539-549.
[134] MU Xiaokai，HE Bo，ZHANG Xin，et al. End-to-end navigation for autonomous underwater vehicle with hybrid recurrent neural networks[J]. Ocean Engineering，2019，194:106602.
[135] CHRISTOPHER J，CUEVA A，WEI Xuexin，Emergence of grid-like representations by training recurrent neural networks to perform spatial localization[C]//6th International Conference on Learning Representations，Vancouver:OpenReview.net，2018.
[136] BRAHMI H，AMMAR B，ALIMI A M. Intelligent path planning algorithm for autonomous robot based on recurrent neural networks[C]//Proceedings of 2013 International Conference on Advanced Logistics and Transport，Tunisia:IEEE，2013:199-204.
[137] YANG Guangzhong，BELLINGHAM J，DUPONT P E，et al. The grand challenges of science robotics[J]. Science Robotics，2018，3(14):eaar7650.
[138] BIANCO F，OGNIBENE D. From psychological intention recognition theories to adaptive theory of mind for robots:Computational models[C]//Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction，Cambridge:ACM，2020:136-138.
[139] CSIBRA G，GERGELY G. ‘Obsessed with goals’:Functions and mechanisms of teleological interpretation of actions in humans[J]. Acta Psychologica，2007，124(1):60-78.
[140] SOUTHGATE V，JOHNSON M H，CSIBRA G. Infants attribute goals even to biomechanically impossible actions[J]. Cognition，2008，107(3):1059-1069.
[141] CANGELOSI A，SCHLESINGER M. From babies to robots:the contribution of developmental robotics to developmental psychology[J]. Child Development Perspectives，2018，12(3):183-188.
[142] DEMIRIS Y，DEARDEN A. From motor babbling to hierarchical learning by imitation:a robot developmental pathway[C]//Proceedings of the Fifth International Workshop on Epigenetic Robotics:Modeling Cognitive Development in Robotic Systems，Lund:Lund University Cognitive Studies，2005，31-37.
[143] BIANCO F，OGNIBENE D. Functional advantages of an adaptive theory of mind for robotics:A review of current architectures[C]//11th Computer Science and Electronic Engineering (CEEC)，Colchester:IEEE，2019:139-143.
[144] GÖRÜR O C，ROSMAN B S，HOFFMAN G，et al. Toward integrating theory of mind into adaptive decision-making of social robots to understand human intention[C]//Workshop on the Role of Intentions in Human-robot Interaction at the 12th ACM/IEEE International Conference on Human-robot Interaction，Vienna:ACM，2017:1.
[145] LEMAIGNAN S，WARNIER M，SISBOT E A，et al. Artificial cognition for social human-robot interaction:An implementation[J]. Artificial Intelligence，2017，247:45-69.
[146] CHAME H F，AHMADI A，TANI J. A hybrid human-neurorobotics approach to primary intersubjectivity via active inference[J]. Frontiers in Psychology，2020，11:584869.
[147] SCHOELLER F，MILLER M，SALOMON R，et al. Trust as extended control:Human-machine interactions as active inference[J]. Frontiers in Systems Neuroscience，2021，15:93.
[148] OHATA W，TANI J. Investigation of the sense of agency in social cognition，based on frameworks of predictive coding and active inference:A simulation study on multimodal imitative interaction[J]. Frontiers in Neurorobotics，2020，14:61.
[149] CHAME H F，TANI J. Cognitive and motor compliance in intentional human-robot interaction[C]//Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE，2020:11291-11297.
[150] PENG Baolin，GALLEY M，HE Pengcheng，et al. Godel:Large-scale pre-training for goal-directed dialog[J]. arXiv preprint arXiv:2206.11309，2022.
[151] HAM D，LEE J G，Jang Y，et al. End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics，2020:583-592.
[152] PLAPPERT M，ANDRYCHOWICZ M，RAY A，et al. Multi-goal reinforcement learning:Challenging robotics environments and request for research[J]. arXiv preprint arXiv:1802.09464，2018.
[153] ZHU Menghui，LIU Minghuan，SHEN Jian，et al. Mapgo:Model-assisted policy optimization for goal-oriented tasks[C]//The 30th International Joint Conference on Artificial Intelligence (IJCAI)，2021:1.
[154] KHAZATSKY A，NAIR A，JING D，et al. What can I do here? learning new skills by imagining visual affordances[C]//Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA)，Xi’an:IEEE，2021:14291-14297.
[155] BROCKMAN G，CHEUNG V，PETTERSSON L，et al. Openai gym[J]. arXiv preprint arXiv:1606.01540，2016.
[156] JOHNSON M，HOFMANN K，HUTTON T，et al. The Malmo platform for artificial intelligence experimentation[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence，New York:ACM，2016:4246-4247.
[157] NAIR A V，PONG V，DALAL M，et al. Visual reinforcement learning with imagined goals[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems，Montréal:Curran Associates Inc.，2018:9209-9220.
[158] NAIR S，SAVARESE S，FINN C. Goal-aware prediction:Learning to model what matters[C]//Proceedings of the 37th International Conference on Machine Learning，Vienna:PMLR，2020:7207-7219.
[159] PEZZATO C，FERRARI R，CORBATO C H. A novel adaptive controller for robot manipulators based on active inference[J]. IEEE Robotics and Automation Letters，2020，5(2):2973-2980.
[160] PEZZULO G，RIGOLI F，FRISTON K. Active inference，homeostatic regulation and adaptive behavioural control[J]. Progress in Neurobiology，2015，134:17-35.
[161] PIO-LOPEZ L，NIZARD A，FRISTON K，et al. Active inference and robot control:A case study[J]. Journal of The Royal Society Interface，2016，13(122):20160616.
[162] SCHNEIDER T，BELOUSOV B，ABDULSAMAD H，et al. Active inference for robotic manipulation[J]. arXiv preprint arXiv:2206.10313，2022.
[163] ZENG Zhen，ZHOU Zheming，SUI Zhiqiang，et al. Semantic robot programming for goal-directed manipulation in cluttered scenes[C]//Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA)，Prague:IEEE，2018:7462-7469.
[164] SUI Zhiqiang，JENKINS O C，DESINGH K. Axiomatic particle filtering for goal-directed robotic manipulation[C]//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)，Hamburg:IEEE，2015:4429-4436.
[165] SUI Zhiqiang，ZHOU Zheming，ZENG Zhen，et al. Sum:Sequential scene understanding and manipulation[C]//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)，Vancouver:IEEE，2017:3281-3288.
[166] SUI Zhiqiang，XIANG Lingzhu，JENKINS O C，et al. Goal-directed robot manipulation through axiomatic scene estimation[J]. The International Journal of Robotics Research，2017，36(1):86-104.
[167] CHEN Xiaotong，CHEN Rui，SUI Zhiqiang，et al. Grip:Generative robust inference and perception for semantic robot manipulation in adversarial environments[C]//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)，Macau:IEEE，2019:3988-3995.
[168] BRAUD R，PITTI A，GAUSSIER P. Dynamic sensorimotor model for open-ended acquisition of tool-use[C]//Proceedings of 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)，Paris:IEEE，2016:286-287.
[169] REINHART R F，STEIL J J. Reaching movement generation with a recurrent neural network based on learning inverse kinematics for the humanoid robot iCub[C]//Proceedings of the 9th IEEE-RAS International Conference on Humanoid Robots，Paris:IEEE，2009:323-330.
[170] REINHART R F，STEIL J J. Goal-directed movement generation with a transient-based recurrent neural network controller[C]//Proceedings of 2009 Advanced Technologies for Enhanced Quality of Life，Iasi:IEEE，2009:112-117.
[171] ZHONG Shanlin，ZHOU Junjie，QIAO Hong. Bioinspired gain-modulated recurrent neural network for controlling musculoskeletal robot[J]. IEEE Transactions on Neural Networks and Learning Systems，2021(1):1-15.
[172] HUANG Xiao，WU Wei，QIAO Hong，et al. Brain-inspired motion learning in recurrent neural network with emotion modulation[J]. IEEE Transactions on Cognitive and Developmental Systems，2018，10(4):1153-1164.
[173] MORIMOTO J，DOYA K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning[J]. Robotics and Autonomous Systems，2001，36(1):37-51.
[174] YANG Xintong，JI Ze，WU Jing，et al. Hierarchical reinforcement learning with universal policies for multistep robotic manipulation[J]. IEEE Transactions on Neural Networks and Learning Systems，2021，33(9):4727-4741.
[175] HOU Zhimin，FEI Jiajun，DENG Yuelin，et al. Data-efficient hierarchical reinforcement learning for robotic assembly control applications[J]. IEEE Transactions on Industrial Electronics，2020，68(11):11565-11575.
[176] CHENG Xiaobei，SHEN Jing，LIU Haibo，et al. Multi-robot cooperation based on hierarchical reinforcement learning[C]//Proceedings of the International Conference on Computational Science 2007，Beijing:Springer，2007:90-97.
[177] MAKAR R，MAHADEVAN S，GHAVAMZADEH M. Hierarchical multi-agent reinforcement learning[C]//Proceedings of the Fifth International Conference on Autonomous Agents，Montreal:Association for Computing Machinery，2001:246-253.
[178] LE BARS S，CHOKRON S，BALP R，et al. Theoretical perspective on an ideomotor brain-computer interface:Toward a naturalistic and non-invasive brain-computer interface paradigm based on action-effect representation[J]. Frontiers in Human Neuroscience，2021，15:732-764.
[179] MOKHTARI V，LOPES L S，PINHO A J. Experience-based robot task learning and planning with goal inference[C]//Proceedings of the Twenty-Sixth International Conference on Automated Planning and Scheduling，London:AAAI，2016:509-517.
[180] SANTUCCI V G，BALDASSARRE G，MIROLLI M. GRAIL:A goal-discovering robotic architecture for intrinsically-motivated learning[J]. IEEE Transactions on Cognitive and Developmental Systems，2016，8(3):214-231.
[181] KOTSERUBA I，TSOTSOS J K. 40 years of cognitive architectures:Core cognitive abilities and practical applications[J]. Artificial Intelligence Review，2020，53(1):17-94.