Exploring Open-Vocabulary Models for Category-Free Detection

García Fernández, PabloMucientes Molina, ManuelCores Costa, Daniel2025-11-102025-11-102025-09-22Garcia-Fernandez, P., Cores, D., Mucientes, M. (2026). Exploring Open-Vocabulary Models for Category-Free Detection. In: Castrillón-Santana, M., et al. Computer Analysis of Images and Patterns. CAIP 2025. Lecture Notes in Computer Science, vol 15621. Springer, Cham. https://doi.org/10.1007/978-3-032-04968-1_24978-3-032-04968-1https://hdl.handle.net/10347/43664Paper presented in The 21st International Conference in Computer Analysis of Images and PatternsObject detection models typically rely on a predefined setof categories, limiting their applicability in real-world scenarios whereobject classes may be unknown. In this paper, we propose a novel,training-free framework that enables off-the-shelf open-vocabulary ob-ject detectors (OvOD) to perform category-free detection —localizingand classifying objects without any prior category knowledge. Our ap-proach leverages image captioning to dynamically generate descriptiveterms directly from the image content, followed by a WordNet-based fil-tering process to extract semantically meaningful category names. Thesediscovered categories are then embedded and matched with visual regionfeatures using a frozen OvOD model to perform detection. We evaluateour method on the COCO dataset in a fully zero-shot setting and demon-strate that it significantly outperforms strong multimodal large languagemodel baselines, achieving an improvement of over 30 AP points. Thishighlights our method as a promising direction for more adaptive solu-tions to real-world detection challenges.engCategory-freeOpen-vocabulary object detectionCaptioningExploring Open-Vocabulary Models for Category-Free Detectionbook part10.1007/978-3-032-04968-1_24open access