ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.05167
33
4

Presto! Distilling Steps and Layers for Accelerating Music Generation

7 October 2024
Zachary Novack
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
ArXivPDFHTML
Abstract

Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge. Sound examples can be found atthis https URL.

View on arXiv
@article{novack2025_2410.05167,
  title={ Presto! Distilling Steps and Layers for Accelerating Music Generation },
  author={ Zachary Novack and Ge Zhu and Jonah Casebeer and Julian McAuley and Taylor Berg-Kirkpatrick and Nicholas J. Bryan },
  journal={arXiv preprint arXiv:2410.05167},
  year={ 2025 }
}
Comments on this paper