RT Journal Article
T1 Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space
A1 Perianez-Pascual, Jorge
A1 Gutiérrez, Juan D.
A1 Escobar-Encinas, Laura
A1 Rubio-Largo, Álvaro
A1 Rodriguez-Echeverria, Roberto
K1 Artificial Intelligence
K1 Audio Classification
K1 Deep Learning
K1 Foundation Models
AB This paper presents a novel approach to audio classification leveraging the latent representation generated by Meta's EnCodec neural audio codec. We hypothesize that the compressed latent space representation captures essential audio features more suitable for classification tasks than the traditional spectrogram-based approaches. We train a vanilla convolutional neural network for music genre, speech/music, and environmental sound classification using EnCodec's encoder output as input to validate this. Then, we compare its performance training with the same network using a spectrogram-based representation as input. Our experiments demonstrate that this approach achieves comparable accuracy to state-of-the-art methods while exhibiting significantly faster convergence and reduced computational load during training. These findings suggest the potential of EnCodec's latent representation for efficient, faster, and less expensive audio classification applications. We analyze the characteristics of EnCodec's output and compare its performance against traditional spectrogram-based approaches, providing insights into this novel approach’s advantages.
PB MDPI
SN 1999-4893
YR 2025
FD 2025-02-16
LK https://hdl.handle.net/10347/40815
UL https://hdl.handle.net/10347/40815
LA eng
NO Perianez-Pascual, J.; Gutiérrez, J.D.; Escobar-Encinas, L.; Rubio-Largo, Á.; Rodriguez-Echeverria, R. Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space. Algorithms 2025, 18, 108. https://doi.org/10.3390/a18020108
NO This work was supported by Grant CPP2021-008491 funded by MICIU/AEI/10.13039/ 50100011033 and by the European Union NextGenerationEU/PRTR.
DS Minerva
RD 28 abr 2026