Relation networks for few-shot video object detection

Cores Costa, DanielSeidenari, LorenzoBimbo, Alberto delBrea Sánchez, Víctor ManuelMucientes Molina, Manuel2025-11-172025-11-172023-06-25Cores, D., Seidenari, L., Bimbo, A.D., Brea, V.M., Mucientes, M. (2023). Relation Networks for Few-Shot Video Object Detection. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_19978-3-031-36616-1https://hdl.handle.net/10347/43854This paper describes a new few-shot video object detection framework that leverages spatio-temporal information through a relation module with attention mechanisms to mine relationships among proposals in different frames. The output of the relation module feeds a spatio-temporal double head with a category-agnostic confidence predictor to decrease overfitting in order to address the issue of reduced training sets inherent to few-shot solutions. The predicted score is the input to a long-term object linking approach that provides object tubes across the whole video, which ensures spatio-temporal consistency. Our proposal establishes a new state-of-the-art in the FSVOD500 dataset.engFew-shot object detectionVideo object detection120304 Inteligencia artificialRelation networks for few-shot video object detectionbook part10.1007/978-3-031-36616-1_19open access