STDnet: Exploiting high resolution feature maps for small object detection

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Informacióngl
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Electrónica e Computacióngl
dc.contributor.areaÁrea de Enxeñaría e Arquitectura
dc.contributor.authorBosquet Mera, Brais
dc.contributor.authorMucientes Molina, Manuel
dc.contributor.authorBrea Sánchez, Víctor Manuel
dc.date.accessioned2021-03-09T08:04:06Z
dc.date.available2022-03-25T02:00:09Z
dc.date.issued2020
dc.description.abstractThe accuracy of small object detection with convolutional neural networks (ConvNets) lags behind that of larger objects. This can be observed in popular contests like MS COCO. This is in part caused by the lack of specific architectures and datasets with a sufficiently large number of small objects. Our work aims at these two issues. First, this paper introduces STDnet, a convolutional neural network focused on the detection of small objects that we defined as those under 16 × 16 pixels. The high performance of STDnet is built on a novel early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions, while discarding the rest of the input image. Processing only specific areas allows STDnet to keep high resolution feature maps in deeper layers providing low memory overhead and higher frame rates. High resolution feature maps were proved to be key to increasing localization accuracy in such small objects. Second, we also present USC-GRAD-STDdb, a video dataset with more than 56,000 annotated small objects in challenging scenarios. Experimental results over USC-GRAD-STDdb show that STDnet improves the AP@.5 of the best state-of-the-art object detectors for small target detection from 50.8% to 57.4%. Performance has also been tested in MS COCO for objects under 16 × 16 pixels. In addition, a spatio-temporal baseline network, STDnet-bST, has been proposed to make use of the information of successive frames, increasing the AP@.5 of STDnet in 2.3%. Finally, optimizations have been carried out to be fit on embedded devices such as Jetson TX2gl
dc.description.peerreviewedSIgl
dc.description.sponsorshipThis research was funded by Gradiant, Spain, and also partially funded by the Spanish Ministry of Economy and Competitiveness under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32 (MICINN), and the Galician Ministry of Education, Culture and Universities, Spain under grant ED431G/08. Brais Bosquet is supported by the Galician Ministry of Education, Culture and Universities, Spain . These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)gl
dc.identifier.citationEngineering Applications of Artificial Intelligence, Volume 91, May 2020, 103615gl
dc.identifier.doi10.1016/j.engappai.2020.103615
dc.identifier.issn0952-1976
dc.identifier.urihttp://hdl.handle.net/10347/24671
dc.language.isoenggl
dc.publisherElseviergl
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TIN2017-84796-C2-1-R/ES/APORTANDO INTELIGENCIA A LOS PROCESOS DE NEGOCIO MEDIANTE SOFT COMPUTING EN ESCENARIOS DE DATOS MASIVOS
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/ RTI2018-097088-B-C32/ES/SENSORES CMOS DE VISION, GESTION DE ENERGIA Y SEGUIMIENTO DE OBJETOS SOBRE GPUS EMPOTRADAS
dc.relation.publisherversionhttps://doi.org/10.1016/j.engappai.2020.103615gl
dc.rights© 2020 Elsevier Ltd. This manuscript version is made available under the CC-BY-NC-ND 4.0 license (http://creativecommons.org/licenses/by-nc-nd/4.0/)gl
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional
dc.rights.accessRightsopen accessgl
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectSmall object detectiongl
dc.subjectConvolution neural networks (ConvNets)gl
dc.subjectDeep learninggl
dc.titleSTDnet: Exploiting high resolution feature maps for small object detectiongl
dc.typejournal articlegl
dc.type.hasVersionAMgl
dspace.entity.typePublication
relation.isAuthorOfPublication21112b72-72a3-4a96-bda4-065e7e2bb262
relation.isAuthorOfPublication22d4aeb8-73ba-4743-a84e-9118799ab1f2
relation.isAuthorOfPublication.latestForDiscovery21112b72-72a3-4a96-bda4-065e7e2bb262

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2020_engappai_bosquet_stdnet.pdf
Size:
15.83 MB
Format:
Adobe Portable Document Format
Description: