RT Dissertation/Thesis
T1 Spatio-temporal convolutional neural networks for video object detection
A1 Cores Costa, Daniel
K1 video object detection
K1 convolutional neural networks
AB The object detection problem is composed of two main tasks, objectlocalization and object classification. The detection precision in images has greatly improved with the use of DeepLearning techniques, especially with the adoption of Convolutional Neural Networks. However, object detection invideos presents new challenges such as motion blur, out-of-focus or object occlusions that deteriorate objectfeatures in some specific frames. Moreover, traditional object detectors do not exploit spatio-temporal informationthat can be crucial to address these new challenges, boosting the detection precision. Hence, new object detectionframeworks specifically designed for videos are needed to replicate the same success achieved in the single imagedomain. The availability of spatio-temporal information unlocks the possibility of analyzing long- and short-termrelations among detections at different time steps. This highly improves the object classification precision indeteriorated frames in which a single image object detector would not be able to provide the correct objectcategory. We propose new methods to establish these relations and aggregate information from different frames,proving through experimentation that they improve single image baseline and previous video object detectors. Inaddition, we also explore the utility of spatio-temporal information to reduce the number of training examples,keeping a competitive detection precision. Thus, this approach makes it possible to apply our proposal in domainsin which training data is scarce and, also, it generally reduces the annotation costs.
YR 2022
FD 2022
LK http://hdl.handle.net/10347/29792
UL http://hdl.handle.net/10347/29792
LA eng
DS Minerva
RD 3 may 2026