Spatio-temporal convolutional neural networks for video object detection
Loading...
Identifiers
Publication date
Authors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The object detection problem is composed of two main tasks, object
localization and object classification. The detection precision in images has greatly improved with the use of Deep
Learning techniques, especially with the adoption of Convolutional Neural Networks. However, object detection in
videos presents new challenges such as motion blur, out-of-focus or object occlusions that deteriorate object
features in some specific frames. Moreover, traditional object detectors do not exploit spatio-temporal information
that can be crucial to address these new challenges, boosting the detection precision. Hence, new object detection
frameworks specifically designed for videos are needed to replicate the same success achieved in the single image
domain. The availability of spatio-temporal information unlocks the possibility of analyzing long- and short-term
relations among detections at different time steps. This highly improves the object classification precision in
deteriorated frames in which a single image object detector would not be able to provide the correct object
category. We propose new methods to establish these relations and aggregate information from different frames,
proving through experimentation that they improve single image baseline and previous video object detectors. In
addition, we also explore the utility of spatio-temporal information to reduce the number of training examples,
keeping a competitive detection precision. Thus, this approach makes it possible to apply our proposal in domains
in which training data is scarce and, also, it generally reduces the annotation costs.
Description
Bibliographic citation
Relation
Has part
Has version
Is based on
Is part of
Is referenced by
Is version of
Requires
Sponsors
Rights
Attribution-NonCommercial-NoDerivatives 4.0 Internacional








