Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space

Perianez-Pascual, Jorge; Gutiérrez, Juan D.; Escobar-Encinas, Laura; Rubio-Largo, Álvaro; Rodriguez-Echeverria, Roberto

doi:10.3390/a18020108

Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space

dc.contributor.affiliation	Universidade de Santiago de Compostela. Departamento de Electrónica e Computación
dc.contributor.author	Perianez-Pascual, Jorge
dc.contributor.author	Gutiérrez, Juan D.
dc.contributor.author	Escobar-Encinas, Laura
dc.contributor.author	Rubio-Largo, Álvaro
dc.contributor.author	Rodriguez-Echeverria, Roberto
dc.date.accessioned	2025-04-15T08:58:19Z
dc.date.available	2025-04-15T08:58:19Z
dc.date.issued	2025-02-16
dc.description.abstract	This paper presents a novel approach to audio classification leveraging the latent representation generated by Meta's EnCodec neural audio codec. We hypothesize that the compressed latent space representation captures essential audio features more suitable for classification tasks than the traditional spectrogram-based approaches. We train a vanilla convolutional neural network for music genre, speech/music, and environmental sound classification using EnCodec's encoder output as input to validate this. Then, we compare its performance training with the same network using a spectrogram-based representation as input. Our experiments demonstrate that this approach achieves comparable accuracy to state-of-the-art methods while exhibiting significantly faster convergence and reduced computational load during training. These findings suggest the potential of EnCodec's latent representation for efficient, faster, and less expensive audio classification applications. We analyze the characteristics of EnCodec's output and compare its performance against traditional spectrogram-based approaches, providing insights into this novel approach’s advantages.
dc.description.peerreviewed	SI
dc.description.sponsorship	This work was supported by Grant CPP2021-008491 funded by MICIU/AEI/10.13039/ 50100011033 and by the European Union NextGenerationEU/PRTR.
dc.identifier.citation	Perianez-Pascual, J.; Gutiérrez, J.D.; Escobar-Encinas, L.; Rubio-Largo, Á.; Rodriguez-Echeverria, R. Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space. Algorithms 2025, 18, 108. https://doi.org/10.3390/a18020108
dc.identifier.doi	10.3390/a18020108
dc.identifier.issn	1999-4893
dc.identifier.uri	https://hdl.handle.net/10347/40815
dc.issue.number	2
dc.journal.title	Algorithms
dc.language.iso	eng
dc.publisher	MDPI
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 20231-2023/CPP2021-008491/ES/MUSICGENIA: Una Plataforma en la Nube para de Generación de Música bajo Demanda por medio de Inteligencia Artificial/
dc.relation.publisherversion	https://doi.org/10.3390/a18020108
dc.rights	© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license
dc.rights	Attribution 4.0 International	en
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	Artificial Intelligence
dc.subject	Audio Classification
dc.subject	Deep Learning
dc.subject	Foundation Models
dc.title	Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space
dc.type	journal article
dc.type.hasVersion	VoR
dc.volume.number	18
dspace.entity.type	Publication
relation.isAuthorOfPublication	34f83200-7a0f-4455-a120-b9c6daf3bcd4
relation.isAuthorOfPublication.latestForDiscovery	34f83200-7a0f-4455-a120-b9c6daf3bcd4

Files

Original bundle

Now showing 1 - 1 of 1

Name:: algorithms-18-00108-v2.pdf
Size:: 876.52 KB
Format:: Adobe Portable Document Format

Download

Collections

Electrónica e Computación