RT Journal Article
T1 Short-term anchor linking and long-term self-guided attention for video object detection
A1 Cores Costa, Daniel
A1 Brea Sánchez, Víctor Manuel
A1 Mucientes Molina, Manuel
K1 Video object detection
K1 Spatio-temporal features
K1 Convolutional neural networks
AB We present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the short- and long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-of-the-art. Our code is available at https://github.com/daniel-cores/SLTnet
PB Elsevier
YR 2021
FD 2021
LK http://hdl.handle.net/10347/27098
UL http://hdl.handle.net/10347/27098
LA eng
NO Image and Vision Computing. Volume 110, June 2021, 104179
NO This research was partially funded by the Spanish Ministry of Science, Innovation and Universities under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grants ED431C 2018/29, ED431C 2017/69 and accreditation 2016-2019, ED431G/08. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)
DS Minerva
RD 8 jun 2026