Exploring Open-Vocabulary Models for Category-Free Detection

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Electrónica e Computación
dc.contributor.authorGarcía Fernández, Pablo
dc.contributor.authorMucientes Molina, Manuel
dc.contributor.authorCores Costa, Daniel
dc.date.accessioned2025-11-10T13:40:13Z
dc.date.available2025-11-10T13:40:13Z
dc.date.issued2025-09-22
dc.descriptionPaper presented in The 21st International Conference in Computer Analysis of Images and Patterns
dc.description.abstractObject detection models typically rely on a predefined setof categories, limiting their applicability in real-world scenarios whereobject classes may be unknown. In this paper, we propose a novel,training-free framework that enables off-the-shelf open-vocabulary ob-ject detectors (OvOD) to perform category-free detection —localizingand classifying objects without any prior category knowledge. Our ap-proach leverages image captioning to dynamically generate descriptiveterms directly from the image content, followed by a WordNet-based fil-tering process to extract semantically meaningful category names. Thesediscovered categories are then embedded and matched with visual regionfeatures using a frozen OvOD model to perform detection. We evaluateour method on the COCO dataset in a fully zero-shot setting and demon-strate that it significantly outperforms strong multimodal large languagemodel baselines, achieving an improvement of over 30 AP points. Thishighlights our method as a promising direction for more adaptive solu-tions to real-world detection challenges.
dc.description.sponsorshipThis work was partially supported by the Spanish Ministerio de Ciencia e In- novación (grant numbers PID2020-112623GB-I00, PID2023-149549NB-I00), and the Galician Consellería de Cultura, Educación e Universidade (2024-2027 ED431G- 2023/04). These grants are co-funded by the European Regional Development Fund (ERDF). Pablo Garcia-Fernandez is supported by the Spanish Ministerio de Universidades under the FPU national plan (grant number FPU21/05581).
dc.identifier.citationGarcia-Fernandez, P., Cores, D., Mucientes, M. (2026). Exploring Open-Vocabulary Models for Category-Free Detection. In: Castrillón-Santana, M., et al. Computer Analysis of Images and Patterns. CAIP 2025. Lecture Notes in Computer Science, vol 15621. Springer, Cham. https://doi.org/10.1007/978-3-032-04968-1_24
dc.identifier.doi10.1007/978-3-032-04968-1_24
dc.identifier.isbn978-3-032-04968-1
dc.identifier.urihttps://hdl.handle.net/10347/43664
dc.language.isoeng
dc.publisherSpringer
dc.relation.ispartofseriesLecture Notes in Computer Science; 15621
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-112623GB-I00/ES/
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-149549NB-I00/ES/
dc.relation.publisherversionhttps://doi.org/10.1007/978-3-032-04968-1_24
dc.rights.accessRightsopen access
dc.subjectCategory-free
dc.subjectOpen-vocabulary object detection
dc.subjectCaptioning
dc.titleExploring Open-Vocabulary Models for Category-Free Detection
dc.typebook part
dc.type.hasVersionAM
dspace.entity.typePublication
relation.isAuthorOfPublicationb84267f3-fe5b-4ab3-aed0-cc8a3f34690b
relation.isAuthorOfPublication21112b72-72a3-4a96-bda4-065e7e2bb262
relation.isAuthorOfPublication3daa2166-1c2d-4b3d-bbb0-3d0036bd8cf2
relation.isAuthorOfPublication21112b72-72a3-4a96-bda4-065e7e2bb262
relation.isAuthorOfPublication.latestForDiscovery3daa2166-1c2d-4b3d-bbb0-3d0036bd8cf2

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
caip25_final_20250702092653301.pdf
Size:
920.86 KB
Format:
Adobe Portable Document Format