RT Journal Article
T1 STDnet: Exploiting high resolution feature maps for small object detection
A1 Bosquet Mera, Brais
A1 Mucientes Molina, Manuel
A1 Brea Sánchez, Víctor Manuel
K1 Small object detection
K1 Convolution neural networks (ConvNets)
K1 Deep learning
AB The accuracy of small object detection with convolutional neural networks (ConvNets) lags behind that of larger objects. This can be observed in popular contests like MS COCO. This is in part caused by the lack of specific architectures and datasets with a sufficiently large number of small objects. Our work aims at these two issues. First, this paper introduces STDnet, a convolutional neural network focused on the detection of small objects that we defined as those under 16 × 16 pixels. The high performance of STDnet is built on a novel early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions, while discarding the rest of the input image. Processing only specific areas allows STDnet to keep high resolution feature maps in deeper layers providing low memory overhead and higher frame rates. High resolution feature maps were proved to be key to increasing localization accuracy in such small objects. Second, we also present USC-GRAD-STDdb, a video dataset with more than 56,000 annotated small objects in challenging scenarios. Experimental results over USC-GRAD-STDdb show that STDnet improves the AP@.5 of the best state-of-the-art object detectors for small target detection from 50.8% to 57.4%. Performance has also been tested in MS COCO for objects under 16 × 16 pixels. In addition, a spatio-temporal baseline network, STDnet-bST, has been proposed to make use of the information of successive frames, increasing the AP@.5 of STDnet in 2.3%. Finally, optimizations have been carried out to be fit on embedded devices such as Jetson TX2
PB Elsevier
SN 0952-1976
YR 2020
FD 2020
LK http://hdl.handle.net/10347/24671
UL http://hdl.handle.net/10347/24671
LA eng
NO Engineering Applications of Artificial Intelligence, Volume 91, May 2020, 103615
NO This research was funded by Gradiant, Spain, and also partially funded by the Spanish Ministry of Economy and Competitiveness under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32 (MICINN), and the Galician Ministry of Education, Culture and Universities, Spain under grant ED431G/08. Brais Bosquet is supported by the Galician Ministry of Education, Culture and Universities, Spain . These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)
DS Minerva
RD 30 abr 2026