ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.19806
76
0

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures

29 November 2024
Alain Riou
Antonin Gagnere
Gaëtan Hadjeres
Stefan Lattner
Geoffroy Peeters
ArXivPDFHTML
Abstract

In this paper, we tackle the task of musical stem retrieval. Given a musical mix, it consists in retrieving a stem that would fit with it, i.e., that would sound pleasant if played together. To do so, we introduce a new method based on Joint-Embedding Predictive Architectures, where an encoder and a predictor are jointly trained to produce latent representations of a context and predict latent representations of a target. In particular, we design our predictor to be conditioned on arbitrary instruments, enabling our model to perform zero-shot stem retrieval. In addition, we discover that pretraining the encoder using contrastive learning drastically improves the model's performance.We validate the retrieval performances of our model using the MUSDB18 and MoisesDB datasets. We show that it significantly outperforms previous baselines on both datasets, showcasing its ability to support more or less precise (and possibly unseen) conditioning. We also evaluate the learned embeddings on a beat tracking task, demonstrating that they retain temporal structure and local information.

View on arXiv
@article{riou2025_2411.19806,
  title={ Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures },
  author={ Alain Riou and Antonin Gagneré and Gaëtan Hadjeres and Stefan Lattner and Geoffroy Peeters },
  journal={arXiv preprint arXiv:2411.19806},
  year={ 2025 }
}
Comments on this paper