Short-term anchor linking and long-term self-guided attention for video object detection

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Informacióngl
dc.contributor.authorCores Costa, Daniel
dc.contributor.authorBrea Sánchez, Víctor Manuel
dc.contributor.authorMucientes Molina, Manuel
dc.date.accessioned2021-11-16T07:41:19Z
dc.date.available2021-11-16T07:41:19Z
dc.date.issued2021
dc.description.abstractWe present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the short- and long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-of-the-art. Our code is available at https://github.com/daniel-cores/SLTnetgl
dc.description.peerreviewedSIgl
dc.description.sponsorshipThis research was partially funded by the Spanish Ministry of Science, Innovation and Universities under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grants ED431C 2018/29, ED431C 2017/69 and accreditation 2016-2019, ED431G/08. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)gl
dc.identifier.citationImage and Vision Computing. Volume 110, June 2021, 104179gl
dc.identifier.doi10.1016/j.imavis.2021.104179
dc.identifier.essn0262-8856
dc.identifier.urihttp://hdl.handle.net/10347/27098
dc.language.isoenggl
dc.publisherElseviergl
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TIN2017-84796-C2-1-R /ES/APORTANDO INTELIGENCIA A LOS PROCESOS DE NEGOCIO MEDIANTE SOFT COMPUTING EN ESCENARIOS DE DATOS MASIVOSgl
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-097088-B-C32 /ES/SENSORES CMOS DE VISION, GESTION DE ENERGIA Y SEGUIMIENTO DE OBJETOS SOBRE GPUS EMPOTRADASgl
dc.relation.publisherversionhttps://doi.org/10.1016/j.imavis.2021.104179gl
dc.rights© 2021 The Authors. Published by Elsevier B.V. This work is licenced under a CC Attribution-NonCommercial-NoDerivatives 4.0 International licence (CC BY-NC-ND 4.0)gl
dc.rights.accessRightsopen accessgl
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectVideo object detectiongl
dc.subjectSpatio-temporal featuresgl
dc.subjectConvolutional neural networksgl
dc.titleShort-term anchor linking and long-term self-guided attention for video object detectiongl
dc.typejournal articlegl
dc.type.hasVersionVoRgl
dspace.entity.typePublication
relation.isAuthorOfPublication3daa2166-1c2d-4b3d-bbb0-3d0036bd8cf2
relation.isAuthorOfPublication22d4aeb8-73ba-4743-a84e-9118799ab1f2
relation.isAuthorOfPublication21112b72-72a3-4a96-bda4-065e7e2bb262
relation.isAuthorOfPublication.latestForDiscovery3daa2166-1c2d-4b3d-bbb0-3d0036bd8cf2

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2021_elsevier_image_cores_short.pdf
Size:
1.59 MB
Format:
Adobe Portable Document Format
Description:
Artigo de investigación