期刊文献+

基于混合注意力的Transformer视觉目标跟踪算法 被引量:4

Transformer visual object tracking algorithm based on mixed attention
原文传递
导出
摘要 基于Transformer的视觉目标跟踪算法能够很好地捕获目标的全局信息,但是,在对目标特征的表述上还有进一步提升的空间.为了更好地提升对目标特征的表达能力,提出一种基于混合注意力的Transformer视觉目标跟踪算法.首先,引入混合注意力模块捕捉目标在空间和通道维度中的特征,实现对目标特征上下文依赖关系的建模;然后,通过多个不同空洞率的平行空洞卷积对特征图进行采样,以获得图像的多尺度特征,增强局部特征表达能力;最后,在Transformer编码器中加入所构建的卷积位置编码层,为跟踪器提供精确且长度自适应的位置编码,提升跟踪定位的精度.在OTB 100、VOT 2018和LaSOT等数据集上进行大量实验,实验结果表明,通过基于混合注意力的Transformer网络学习特征间的关系,能够更好地表示目标特征.与其他主流目标跟踪算法相比,所提出算法具有更好的跟踪性能,且能够达到26帧/s的实时跟踪速度. The Transformer-based visual object tracking algorithm can capture the global information of the target well,but there is a possibility of further improvement in the presentation of the object features.To better improve the expression ability of object features,a Transformer visual object tracking algorithm based on mixed attention is proposed.First,the mixed attention module is introduced to capture the features of the object in the spatial and channel dimensions,so as to model the contextual dependencies of the target features.Second,the feature maps are sampled by multiple parallel dilated convolutions with different dilation rates to obtain the multi-scale features of the images,and enhance the local feature representation.Finally,the convolutional position encoding constructed is added to the Transformer encoder to provide accurate and length-adaptive position coding for the tracker,thereby improving the accuracy of tracking and positioning.The experimental results of the proposed algorithm on OTB 100,VOT 2018 and LaSOT show that by learning the relationship between features through the Transformer network based on mixed attention,the object features can be better represented.Compared with other mainstream object tracking algorithms,the proposed algorithm has better tracking performance and achieves a real-time tracking speed of 26 frames per second.
作者 侯志强 郭凡 杨晓麟 马素刚 范九伦 HOU Zhi-qiang;GUO Fan;YANG Xiao-lin;MA Su-gang;FAN Jiu-lun(School of Computer,Xi’an University of Posts&Telecommunications,Xi’an 710121,China;School of Communication and Information Engineering,Xi’an University of Posts&Telecommunications,Xi’an 710121,China)
出处 《控制与决策》 EI CSCD 北大核心 2024年第3期739-748,共10页 Control and Decision
基金 国家自然科学基金项目(62072370)。
关键词 计算机视觉 目标跟踪 孪生网络 深度学习 注意力机制 TRANSFORMER computer vision object tracking siamese network deep learning attention mechanism Transformer
  • 相关文献

参考文献4

二级参考文献6

共引文献271

同被引文献25

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部