RT Journal Article
T1 Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos
A1 Cores Costa, Daniel
A1 Brea Sánchez, Víctor Manuel
A1 Mucientes Molina, Manuel
K1 Video object detection
K1 Small object detection
K1 Convolutional neural network
K1 Spatiotemporal CNN
AB This paper addresses the problem of exploiting spatiotemporal information to improve small object detection precision in video. We propose a two-stage object detector called FANet based on short-term spatiotemporal feature aggregation and long-term object linking to refine object detections. First, we generate a set of short tubelet proposals. Then, we aggregate RoI pooled deep features throughout the tubelet using a new temporal pooling operator that summarizes the information with a fixed output size independent of the tubelet length. In addition, we define a double head implementation that we feed with spatiotemporal information for spatiotemporal classification and with spatial information for object localization and spatial classification. Finally, a long-term linking method builds long tubes with the previously calculated short tubelets to overcome detection errors. The association strategy addresses the generally low overlap between instances of small objects in consecutive frames by reducing the influence of the overlap in the final linking score. We evaluated our model in three different datasets with small objects, outperforming previous state-of-the-art spatiotemporal object detectors and our spatial baseline
PB Springer
SN 0924-669X
YR 2022
FD 2022
LK http://hdl.handle.net/10347/29157
UL http://hdl.handle.net/10347/29157
LA eng
NO Appl Intell (2022). https://doi.org/10.1007/s10489-022-03529-w
NO Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature
DS Minerva
RD 29 abr 2026