RT Journal Article T1 Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos A1 Cores Costa, Daniel A1 Brea Sánchez, Víctor Manuel A1 Mucientes Molina, Manuel K1 Video object detection K1 Small object detection K1 Convolutional neural network K1 Spatiotemporal CNN AB This paper addresses the problem of exploiting spatiotemporal information to improve small object detection precision in video. We propose a two-stage object detector called FANet based on short-term spatiotemporal feature aggregation and long-term object linking to refine object detections. First, we generate a set of short tubelet proposals. Then, we aggregate RoI pooled deep features throughout the tubelet using a new temporal pooling operator that summarizes the information with a fixed output size independent of the tubelet length. In addition, we define a double head implementation that we feed with spatiotemporal information for spatiotemporal classification and with spatial information for object localization and spatial classification. Finally, a long-term linking method builds long tubes with the previously calculated short tubelets to overcome detection errors. The association strategy addresses the generally low overlap between instances of small objects in consecutive frames by reducing the influence of the overlap in the final linking score. We evaluated our model in three different datasets with small objects, outperforming previous state-of-the-art spatiotemporal object detectors and our spatial baseline PB Springer SN 0924-669X YR 2022 FD 2022 LK http://hdl.handle.net/10347/29157 UL http://hdl.handle.net/10347/29157 LA eng NO Appl Intell (2022). https://doi.org/10.1007/s10489-022-03529-w NO Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature DS Minerva RD 28 abr 2026