• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2019, Vol. 55 ›› Issue (11): 123-130.doi: 10.3901/JME.2019.11.123

• 特邀专栏:共融机器人 • 上一篇    下一篇

扫码分享

目标搜索与识别的视觉注意网络与学习方法

吕杰, 罗芳颖, 袁泽剑   

  1. 西安交通大学电子与信息工程学院 西安 710049
  • 收稿日期:2018-06-20 修回日期:2019-03-19 出版日期:2019-06-05 发布日期:2019-06-05
  • 通讯作者: 袁泽剑(通信作者),男,1971年出生,副教授。主要研究方向为计算机视觉与机器学习。E-mail:yuan.ze.jian@xjtu.edu.cn
  • 作者简介:吕杰,男,1992年出生。主要研究方向为计算机视觉。E-mail:jiejielyu@outlook.com;罗芳颖,女,1996年出生,硕士研究生。主要研究方向为计算机视觉。E-mail:fangying.luo@foxmail.com
  • 基金资助:
    国家自然科学基金(91648121,61573280)和国家重点研究计划(2016YFB001001)资助项目。

Visual Attentional Network and Learning Method for Object Search and Recognition

LÜ Jie, LUO Fangying, YUAN Zejian   

  1. School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049
  • Received:2018-06-20 Revised:2019-03-19 Online:2019-06-05 Published:2019-06-05

摘要: 提出一种循环视觉注意网络来同时进行目标搜索和识别。该网络能够从图像中自动选择一个局部观测序列,通过融合局部详细表观和粗略上下文视觉信息,实现视觉目标的高精度定位与识别,比传统的滑动窗口和全图卷积的方法具有更高的目标搜索效率。此外,提出了一种混合损失函数来对网络参数进行端到端的多任务学习,特别在视觉注视点序列损失函数中引入随机性和目标启发的组合策略,可以有效地挖掘更丰富的上下文信息,保证注意点快速接近视觉目标。建立了一个真实场景数据集来验证该模型在感兴趣目标和小目标搜索与识别的性能。试验结果表明,该方法通过几个注视点转移,就能够在一幅图像上预测一个视觉目标的准确边框,并在大图像上获得比较高的搜索速度。开放源代码用于该方法验证与比较分析。

关键词: 目标检测, 强化学习, 注意策略, 注意力模型

Abstract: A recurrent visual network is proposed to search and recognize an object simultaneously. The network can automatically select a sequence of local observations, and accurately localize and recognize objects by fusing those local detail appearance and rough context visual information. The method is more efficient than other methods with sliding windows or convolution on a whole image. Besides, a hybrid loss function is proposed to learn parameters of the multi-task network end-to-end. Especially, The combination of stochastic and object-awareness strategy is imported into visual fixation loss, which is beneficial to mine more abundant context and ensure fixation point close to object as fast as possible. A real-world dataset is built to verify the capacity of the method in searching and recognizing the object of interest including those small ones. Experiments illustrate that the method can predict an accurate bounding box for a visual object, and achieve higher searching speed. The source code will be opened to verify and analyze the method.

Key words: attentional model, fixation strategy, object detection, reinforcement learning

中图分类号: