Tracking more than 100 arbitrary objects at 25 FPS through deep learning

Vaquero Otal, LorenzoBrea Sánchez, Víctor ManuelMucientes Molina, Manuel2021-08-122021-08-122022Pattern Recognition 2022, 121: 108205. https://doi.org/10.1016/j.patcog.2021.1082050031-3203http://hdl.handle.net/10347/26777Most video analytics applications rely on object detectors to localize objects in frames. However, when real-time is a requirement, running the detector at all the frames is usually not possible. This is somewhat circumvented by instantiating visual object trackers between detector calls, but this does not scale with the number of objects. To tackle this problem, we present SiamMT, a new deep learning multiple visual object tracking solution that applies single-object tracking principles to multiple arbitrary objects in real-time. To achieve this, SiamMT reuses feature computations, implements a novel crop-and-resize operator, and defines a new and efficient pairwise similarity operator. SiamMT naturally scales up to several dozens of targets, reaching 25 fps with 122 simultaneous objects for VGA videos, or up to 100 simultaneous objects in HD720 video. SiamMT has been validated on five large real-time benchmarks, achieving leading performance against current state-of-the-art trackerseng© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)Attribution-NonCommercial-NoDerivatives 4.0 Internacionalhttp://creativecommons.org/licenses/by-nc-nd/4.0/Multiple visual object trackingMotion estimationDeep learningSiamese networksinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-112623GB-I00/ESinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-097088-B-C32/ES/SENSORES CMOS DE VISION, GESTION DE ENERGIA Y SEGUIMIENTO DE OBJETOS SOBRE GPUS EMPOTRADASTracking more than 100 arbitrary objects at 25 FPS through deep learningjournal article10.1016/j.patcog.2021.108205open access