Short-term anchor linking and long-term self-guided attention for video object detection

Cores Costa, Daniel; Brea Sánchez, Víctor Manuel; Mucientes Molina, Manuel

doi:10.1016/j.imavis.2021.104179

Short-term anchor linking and long-term self-guided attention for video object detection

Files

2021_elsevier_image_cores_short.pdf (1.59 MB)

Identifiers

URI: http://hdl.handle.net/10347/27098

E-ISSN: 0262-8856

DOI: 10.1016/j.imavis.2021.104179

Publication date

2021

Authors

Cores Costa, Daniel

Brea Sánchez, Víctor Manuel

Mucientes Molina, Manuel

Publisher

Elsevier

Metrics

Export

Abstract

We present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the short- and long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-of-the-art. Our code is available at https://github.com/daniel-cores/SLTnet

Keywords

Video object detection| Spatio-temporal features| Convolutional neural networks

Bibliographic citation

Image and Vision Computing. Volume 110, June 2021, 104179

Publisher version

https://doi.org/10.1016/j.imavis.2021.104179

Rights

Collections

Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
Electrónica e Computación

Full item page

Short-term anchor linking and long-term self-guided attention for video object detection

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Publisher version

Sponsors

Rights

Collections