Learning Disentangled Audio Representations through Controlled Synthesis
Main:1 Pages
12 Figures
Bibliography:2 Pages
1 Tables
Appendix:9 Pages
Abstract
This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce SynTone, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques. Benchmarking state-of-the-art methods on SynTone highlights its utility for method evaluation. Our results underscore strengths and limitations in audio disentanglement, motivating future research.
View on arXivComments on this paper
