156

Learning Disentangled Audio Representations through Controlled Synthesis

Main:1 Pages
12 Figures
Bibliography:2 Pages
1 Tables
Appendix:9 Pages
Abstract

This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce SynTone, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques. Benchmarking state-of-the-art methods on SynTone highlights its utility for method evaluation. Our results underscore strengths and limitations in audio disentanglement, motivating future research.

View on arXiv
Comments on this paper