STDnet: Exploiting high resolution feature maps for small object detection

Bosquet Mera, Brais; Mucientes Molina, Manuel; Brea Sánchez, Víctor Manuel

doi:10.1016/j.engappai.2020.103615

STDnet: Exploiting high resolution feature maps for small object detection

Files

2020_engappai_bosquet_stdnet.pdf (15.83 MB)

Identifiers

URI: http://hdl.handle.net/10347/24671

ISSN: 0952-1976

DOI: 10.1016/j.engappai.2020.103615

Publication date

2020

Authors

Bosquet Mera, Brais

Mucientes Molina, Manuel

Brea Sánchez, Víctor Manuel

Publisher

Elsevier

Metrics

Export

Abstract

The accuracy of small object detection with convolutional neural networks (ConvNets) lags behind that of larger objects. This can be observed in popular contests like MS COCO. This is in part caused by the lack of specific architectures and datasets with a sufficiently large number of small objects. Our work aims at these two issues. First, this paper introduces STDnet, a convolutional neural network focused on the detection of small objects that we defined as those under 16 × 16 pixels. The high performance of STDnet is built on a novel early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions, while discarding the rest of the input image. Processing only specific areas allows STDnet to keep high resolution feature maps in deeper layers providing low memory overhead and higher frame rates. High resolution feature maps were proved to be key to increasing localization accuracy in such small objects. Second, we also present USC-GRAD-STDdb, a video dataset with more than 56,000 annotated small objects in challenging scenarios. Experimental results over USC-GRAD-STDdb show that STDnet improves the AP@.5 of the best state-of-the-art object detectors for small target detection from 50.8% to 57.4%. Performance has also been tested in MS COCO for objects under 16 × 16 pixels. In addition, a spatio-temporal baseline network, STDnet-bST, has been proposed to make use of the information of successive frames, increasing the AP@.5 of STDnet in 2.3%. Finally, optimizations have been carried out to be fit on embedded devices such as Jetson TX2

Keywords

Small object detection| Convolution neural networks (ConvNets)| Deep learning

Bibliographic citation

Engineering Applications of Artificial Intelligence, Volume 91, May 2020, 103615

Publisher version

https://doi.org/10.1016/j.engappai.2020.103615

Rights

© 2020 Elsevier Ltd. This manuscript version is made available under the CC-BY-NC-ND 4.0 license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Collections

Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
Electrónica e Computación

Full item page

STDnet: Exploiting high resolution feature maps for small object detection

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Publisher version

Sponsors

Rights

Collections