目标搜索与识别的视觉注意网络与学习方法

doi:10.3901/JME.2019.11.123

机械工程学报 ›› 2019, Vol. 55 ›› Issue (11): 123-130.doi: 10.3901/JME.2019.11.123

• 特邀专栏：共融机器人 • 上一篇下一篇

扫码分享

目标搜索与识别的视觉注意网络与学习方法

吕杰, 罗芳颖, 袁泽剑

西安交通大学电子与信息工程学院西安 710049

收稿日期:2018-06-20 修回日期:2019-03-19 出版日期:2019-06-05 发布日期:2019-06-05
通讯作者: 袁泽剑(通信作者),男,1971年出生,副教授。主要研究方向为计算机视觉与机器学习。E-mail:yuan.ze.jian@xjtu.edu.cn
作者简介:吕杰,男,1992年出生。主要研究方向为计算机视觉。E-mail:jiejielyu@outlook.com;罗芳颖,女,1996年出生,硕士研究生。主要研究方向为计算机视觉。E-mail:fangying.luo@foxmail.com
基金资助:
国家自然科学基金（91648121，61573280）和国家重点研究计划（2016YFB001001）资助项目。

Visual Attentional Network and Learning Method for Object Search and Recognition

LÜ Jie, LUO Fangying, YUAN Zejian

School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049

Received:2018-06-20 Revised:2019-03-19 Online:2019-06-05 Published:2019-06-05

摘要/Abstract

摘要： 提出一种循环视觉注意网络来同时进行目标搜索和识别。该网络能够从图像中自动选择一个局部观测序列，通过融合局部详细表观和粗略上下文视觉信息，实现视觉目标的高精度定位与识别，比传统的滑动窗口和全图卷积的方法具有更高的目标搜索效率。此外，提出了一种混合损失函数来对网络参数进行端到端的多任务学习，特别在视觉注视点序列损失函数中引入随机性和目标启发的组合策略，可以有效地挖掘更丰富的上下文信息，保证注意点快速接近视觉目标。建立了一个真实场景数据集来验证该模型在感兴趣目标和小目标搜索与识别的性能。试验结果表明，该方法通过几个注视点转移，就能够在一幅图像上预测一个视觉目标的准确边框，并在大图像上获得比较高的搜索速度。开放源代码用于该方法验证与比较分析。

关键词: 目标检测, 强化学习, 注意策略, 注意力模型

Abstract: A recurrent visual network is proposed to search and recognize an object simultaneously. The network can automatically select a sequence of local observations, and accurately localize and recognize objects by fusing those local detail appearance and rough context visual information. The method is more efficient than other methods with sliding windows or convolution on a whole image. Besides, a hybrid loss function is proposed to learn parameters of the multi-task network end-to-end. Especially, The combination of stochastic and object-awareness strategy is imported into visual fixation loss, which is beneficial to mine more abundant context and ensure fixation point close to object as fast as possible. A real-world dataset is built to verify the capacity of the method in searching and recognizing the object of interest including those small ones. Experiments illustrate that the method can predict an accurate bounding box for a visual object, and achieve higher searching speed. The source code will be opened to verify and analyze the method.

Key words: attentional model, fixation strategy, object detection, reinforcement learning

中图分类号:

TG156

吕杰, 罗芳颖, 袁泽剑. 目标搜索与识别的视觉注意网络与学习方法[J]. 机械工程学报, 2019, 55(11): 123-130.

LÜ Jie, LUO Fangying, YUAN Zejian. Visual Attentional Network and Learning Method for Object Search and Recognition[J]. Journal of Mechanical Engineering, 2019, 55(11): 123-130.

参考文献

[1] VIOLA P,JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision,2004,57(2):137-154.
[2] FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D A,et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645.
[3] SADEGHI M A,FORSYTH D. 30Hz object detection with DPM V5[C]//European Conference on Computer Vision (ECCV). September 6-12,2014,Zurich. Springer,2014:65-79.
[4] GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 23-28,2014,Columbus,OH. IEEE,2014:580-587.
[5] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). December 7-13,2015,Santiago,Chile. IEEE,2016:1440-1448.
[6] REN Shaoqing,HE Kaiming,GIRSHICK R,et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(6):1137-1149.
[7] REDMON J,DIVVALA S K,GIRSHICK R B,et al. You only look once:Unified,real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30,2016,Las Vegas,NV,USA. IEEE,2016:779-788.
[8] LIU Wei,ANGUELOV D,ERHAN D,et al. SSD:Single shot multibox detector[C]//European Conference on Computer Vision (ECCV). October 8-16,2016,Amsterdam. Springer,2016:21-37.
[9] SCHMIDHUBER J,HUBER R. Learning to generate artificial FOVEA Trajectories for target detection[J]. International Journal of Neural Systems,1991,02(01n02):125-134.
[10] TORRALBA A,OLIVA A,CASTELHANO M S,et al. Contextual guidance of eye movements and attention in real-world scenes:The role of global features in object search[J]. Psychological Review,2006,113(4):766-786.
[11] OLIVA A,TORRALBA A. The role of context in object recognition[J]. Trends in Cognitive Sciences,2007,11(12):1-527.
[12] BELL S,ZITNICK C L,BALA K,et al. Inside-outside net:Detecting objects in context with skip pooling and recurrent neural networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30,2016,Las Vegas,NV,USA. IEEE,2016:2874-2883.
[13] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//European Conference on Computer Vision (ECCV),September 6-12,2014,Zurich. Springer,2014:346-361.
[14] LAROCHELLE H,HINTON G E. Learning to combine foveal glimpses with a third-order Boltzmann machine[J]. Advances in Neural Information Processing Systems (NIPS),2010:1243-1251.
[15] TANG Yichuan,SALAKHUTDINOV R. Learning Stochastic feedforward neural networks[J]. Advances in Neural Information Processing Systems,2013,1:530-538.
[16] REZENDE D J,MOHAMED S,WIERSTRA D,et al. Stochastic backpropagation and approximate inference in deep generative models[C]//International Conference on Machine Learning,June 21-26,2014,Beijing. 2014:1278-1286.
[17] MNIH V,HEESS N,GRAVES A,et al. Recurrent models of visual attention[J]. Advances in Neural Information Processing Systems,2014,1:2204-2212.
[18] GRAVES A,WAYNE G,REYNOLDS M,et al. Hybrid computing using a neural network with dynamic external memory[J]. Nature,2016,538(7626):471-476.
[19] RANZATO M. On learning where to look[J]. arXiv:Computer Vision and Pattern Recognition,2014,1:1405.5488.
[20] DENIL M,BAZZANI L,LAROCHELLE H,et al. Learning where to attend with deep architectures for image tracking[J]. Neural Computation,2012,24(8):2151-2184.
[21] XU K,BA J,KIROS R,et al. Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning,July 6-11,2015,Lille. 2015:2048-2057.
[22] BAZZANI L,LAROCHELLE H,MURINO V,et al. Learning attentional policies for tracking and recognition in video with deep networks[C]//International Conference on Machine Learning,June 28-July 2,2011,Bellevue. 2011:937-944.
[23] GREGOR K,DANIHELKA I,GRAVES A,et al. DRAW:A recurrent neural network for image generation[C]//International Conference on Machine Learning,July 6-11,2015,Lille. 2015:1462-1471.
[24] CAICEDO J C,LAZEBNIK S. Active object localization with deep reinforcement learning[C]//IEEE International Conference on Computer Vision (ICCV),December 7-13,2015,Santiago,Chile. IEEE,2016:2488-2496.
[25] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning,1992,8(3):229-256.
[26] SUTTON R S,MCALLESTER D A,SINGH S P,et al. Policy gradient methods for reinforcement learning with function approximation[J]. Advances in Neural Information Processing Systems,1999,1:1057-1063.
[27] LECUN Y,BOSER B E,DENKER J S,et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation,1989,1(4):541-551.

目标搜索与识别的视觉注意网络与学习方法

Visual Attentional Network and Learning Method for Object Search and Recognition

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王无印, 黄子钊, 庄子龙, 方怀瑾, 秦威. 基于深度强化学习的自动化码头堆场场桥调度方法[J]. 机械工程学报, 2024, 60(6): 44-57.
[2]	赵阔, 王皂琦, 潘臻信, 潘扬华, 张中飞, 屈挺. 大数据驱动的快消品终端拜访“云-边”联动决策与优化[J]. 机械工程学报, 2024, 60(6): 58-68.
[3]	郭景华, 李文昌, 王班, 王靖瑶. 基于深度强化学习的网联混合动力汽车队列控制[J]. 机械工程学报, 2024, 60(2): 262-271.
[4]	林晨, 何智成, 黄怡菲, 林智桂, 付广, 黄晋. 多级参数融合网络的驾驶场景目标检测方法研究[J]. 机械工程学报, 2024, 60(10): 64-75.
[5]	李文礼, 张祎楠, 石晓辉, 王梦昕. 基于博弈论的右转无信号交叉口行人行为模拟[J]. 机械工程学报, 2024, 60(10): 86-101.
[6]	曾迪, 郑玲, 李以农, 杨显通. 自动驾驶奖励函数贝叶斯逆强化学习方法[J]. 机械工程学报, 2024, 60(10): 245-260.
[7]	杨硕, 李时珍, 赵中原, 黄小鹏, 黄岩军. 基于时序差分学习模型预测控制的一体化自动驾驶换道策略[J]. 机械工程学报, 2024, 60(10): 329-338.
[8]	毛杨坤, 段现银, 林昕, 傅盈西, 朱锟鹏. 基于目标检测的选区激光熔融成形过程熔池与飞溅监测[J]. 机械工程学报, 2023, 59(9): 335-348.
[9]	张志勇, 黄大洋, 黄彩霞, 胡林, 杜荣华. TD3算法改进与自动驾驶汽车并道策略学习[J]. 机械工程学报, 2023, 59(8): 224-234.
[10]	郭洪飞, 陆鑫宇, 任亚平, 张超勇, 李建庆. 基于强化学习的群体进化算法求解双边多目标同步并行拆解线平衡问题[J]. 机械工程学报, 2023, 59(7): 355-366.
[11]	郑湃, 李成熙, 殷悦, 张荣, 鲍劲松, 王柏村, 谢海波, 王力翚. 增强现实辅助的互认知人机安全交互系统[J]. 机械工程学报, 2023, 59(6): 173-184.
[12]	娄山河, 冯毅雄, 郑浩, 胡炳涛, 洪兆溪, 谭建荣. 模拟人脑工作机制的机械产品概念设计行为原理启发求解方法[J]. 机械工程学报, 2023, 59(24): 344-358.
[13]	彭湃, 耿可可, 王子威, 柳智超, 殷国栋. 智能汽车环境感知方法综述[J]. 机械工程学报, 2023, 59(20): 281-303.
[14]	马丽莹, 魏云冰. 基于DDPG算法的发电企业报价策略研究[J]. 电气工程学报, 2023, 18(2): 192-200.
[15]	林歆悠, 叶卓明, 周斌豪. 基于DQN强化学习的自动驾驶转向控制策略[J]. 机械工程学报, 2023, 59(16): 315-324.