Bosquet Mera, BraisMucientes Molina, ManuelBrea Sánchez, Víctor Manuel2022-06-082022-06-082021Pattern Recognition 116 (2021) 1079290031-3203http://hdl.handle.net/10347/28783Object detection through convolutional neural networks is reaching unprecedented levels of precision. However, a detailed analysis of the results shows that the accuracy in the detection of small objects is still far from being satisfactory. A recent trend that will likely improve the overall object detection success is to use the spatial information operating alongside temporal video information. This paper introduces STDnet-ST, an end-to-end spatio-temporal convolutional neural network for small object detection in video. We define small as those objects under px, where the features become less distinctive. STDnet-ST is an architecture that detects small objects over time and correlates pairs of the top-ranked regions with the highest likelihood of containing those small objects. This permits to link the small objects across the time as tubelets. Furthermore, we propose a procedure to dismiss unprofitable object links in order to provide high quality tubelets, increasing the accuracy. STDnet-ST is evaluated on the publicly accessible USC-GRAD-STDdb, UAVDT and VisDrone2019-VID video datasets, where it achieves state-of-the-art results for small objectseng© 2021 The Authors. Published by Elsevier B.V. This work is licenced under a CC Attribution-NonCommercial-NoDerivatives 4.0 International licence (CC BY-NC-ND 4.0)Attribution-NonCommercial-NoDerivatives 4.0 Internacionalhttp://creativecommons.org/licenses/by-nc-nd/4.0/Small object detectionSpatio-temporal convolutional networkObject linkingSTDnet-ST: Spatio-temporal ConvNet for small object detectionjournal article10.1016/j.patcog.2021.107929open access