STDnet: Exploiting high resolution feature maps for small object detection
Loading...
Identifiers
Publication date
Advisors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
Abstract
The accuracy of small object detection with convolutional neural networks (ConvNets) lags behind that of larger objects. This can be observed in popular contests like MS COCO. This is in part caused by the lack of specific architectures and datasets with a sufficiently large number of small objects. Our work aims at these two issues. First, this paper introduces STDnet, a convolutional neural network focused on the detection of small objects that we defined as those under 16 × 16 pixels. The high performance of STDnet is built on a novel early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions, while discarding the rest of the input image. Processing only specific areas allows STDnet to keep high resolution feature maps in deeper layers providing low memory overhead and higher frame rates. High resolution feature maps were proved to be key to increasing localization accuracy in such small objects. Second, we also present USC-GRAD-STDdb, a video dataset with more than 56,000 annotated small objects in challenging scenarios. Experimental results over USC-GRAD-STDdb show that STDnet improves the AP@.5 of the best state-of-the-art object detectors for small target detection from 50.8% to 57.4%. Performance has also been tested in MS COCO for objects under 16 × 16 pixels. In addition, a spatio-temporal baseline network, STDnet-bST, has been proposed to make use of the information of successive frames, increasing the AP@.5 of STDnet in 2.3%. Finally, optimizations have been carried out to be fit on embedded devices such as Jetson TX2
Description
Bibliographic citation
Engineering Applications of Artificial Intelligence, Volume 91, May 2020, 103615
Relation
Has part
Has version
Is based on
Is part of
Is referenced by
Is version of
Requires
Publisher version
https://doi.org/10.1016/j.engappai.2020.103615Sponsors
This research was funded by Gradiant, Spain, and also partially funded by the Spanish Ministry of Economy and Competitiveness under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32 (MICINN), and the Galician Ministry of Education, Culture and Universities, Spain under grant ED431G/08. Brais Bosquet is supported by the Galician Ministry of Education, Culture and Universities, Spain . These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)
Rights
© 2020 Elsevier Ltd. This manuscript version is made available under the CC-BY-NC-ND 4.0 license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Attribution-NonCommercial-NoDerivatives 4.0 Internacional








