STDnet: Exploiting high resolution feature maps for small object detection

Bosquet Mera, Brais; Mucientes Molina, Manuel; Brea Sánchez, Víctor Manuel

doi:10.1016/j.engappai.2020.103615

STDnet: Exploiting high resolution feature maps for small object detection

dc.contributor.affiliation	Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Información	gl
dc.contributor.affiliation	Universidade de Santiago de Compostela. Departamento de Electrónica e Computación	gl
dc.contributor.area	Área de Enxeñaría e Arquitectura
dc.contributor.author	Bosquet Mera, Brais
dc.contributor.author	Mucientes Molina, Manuel
dc.contributor.author	Brea Sánchez, Víctor Manuel
dc.date.accessioned	2021-03-09T08:04:06Z
dc.date.available	2022-03-25T02:00:09Z
dc.date.issued	2020
dc.description.abstract	The accuracy of small object detection with convolutional neural networks (ConvNets) lags behind that of larger objects. This can be observed in popular contests like MS COCO. This is in part caused by the lack of specific architectures and datasets with a sufficiently large number of small objects. Our work aims at these two issues. First, this paper introduces STDnet, a convolutional neural network focused on the detection of small objects that we defined as those under 16 × 16 pixels. The high performance of STDnet is built on a novel early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions, while discarding the rest of the input image. Processing only specific areas allows STDnet to keep high resolution feature maps in deeper layers providing low memory overhead and higher frame rates. High resolution feature maps were proved to be key to increasing localization accuracy in such small objects. Second, we also present USC-GRAD-STDdb, a video dataset with more than 56,000 annotated small objects in challenging scenarios. Experimental results over USC-GRAD-STDdb show that STDnet improves the AP@.5 of the best state-of-the-art object detectors for small target detection from 50.8% to 57.4%. Performance has also been tested in MS COCO for objects under 16 × 16 pixels. In addition, a spatio-temporal baseline network, STDnet-bST, has been proposed to make use of the information of successive frames, increasing the AP@.5 of STDnet in 2.3%. Finally, optimizations have been carried out to be fit on embedded devices such as Jetson TX2	gl
dc.description.peerreviewed	SI	gl
dc.description.sponsorship	This research was funded by Gradiant, Spain, and also partially funded by the Spanish Ministry of Economy and Competitiveness under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32 (MICINN), and the Galician Ministry of Education, Culture and Universities, Spain under grant ED431G/08. Brais Bosquet is supported by the Galician Ministry of Education, Culture and Universities, Spain . These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)	gl
dc.identifier.citation	Engineering Applications of Artificial Intelligence, Volume 91, May 2020, 103615	gl
dc.identifier.doi	10.1016/j.engappai.2020.103615
dc.identifier.issn	0952-1976
dc.identifier.uri	http://hdl.handle.net/10347/24671
dc.language.iso	eng	gl
dc.publisher	Elsevier	gl
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TIN2017-84796-C2-1-R/ES/APORTANDO INTELIGENCIA A LOS PROCESOS DE NEGOCIO MEDIANTE SOFT COMPUTING EN ESCENARIOS DE DATOS MASIVOS
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/ RTI2018-097088-B-C32/ES/SENSORES CMOS DE VISION, GESTION DE ENERGIA Y SEGUIMIENTO DE OBJETOS SOBRE GPUS EMPOTRADAS
dc.relation.publisherversion	https://doi.org/10.1016/j.engappai.2020.103615	gl
dc.rights	© 2020 Elsevier Ltd. This manuscript version is made available under the CC-BY-NC-ND 4.0 license (http://creativecommons.org/licenses/by-nc-nd/4.0/)	gl
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional
dc.rights.accessRights	open access	gl
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Small object detection	gl
dc.subject	Convolution neural networks (ConvNets)	gl
dc.subject	Deep learning	gl
dc.title	STDnet: Exploiting high resolution feature maps for small object detection	gl
dc.type	journal article	gl
dc.type.hasVersion	AM	gl
dspace.entity.type	Publication
relation.isAuthorOfPublication	21112b72-72a3-4a96-bda4-065e7e2bb262
relation.isAuthorOfPublication	22d4aeb8-73ba-4743-a84e-9118799ab1f2
relation.isAuthorOfPublication.latestForDiscovery	21112b72-72a3-4a96-bda4-065e7e2bb262

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2020_engappai_bosquet_stdnet.pdf
Size:: 15.83 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
Electrónica e Computación