v1v2 (latest)

SynthCloner: Synthesizer-style Audio Transfer via Factorized Codec with ADSR Envelope Control

29 September 2025

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)Github (304★)

Main:4 Pages

2 Figures

Bibliography:1 Pages

2 Tables

Abstract

Electronic synthesizer sounds are controlled by parameter settings that yield complex timbral characteristics and ADSR envelopes, making synthesizer-style audio transfer particularly challenging. Recent approaches to timbre transfer often rely on spectral objectives or implicit style matching, offering limited control over envelope shaping. Moreover, public synthesizer datasets rarely provide diverse coverage of timbres and ADSR envelopes. To address these gaps, we present SynthCloner, a factorized codec model that disentangles audio into three attributes: ADSR envelope, timbre, and content. This separation enables expressive audio transfer with independent control over these attributes. Additionally, we introduce SynthCAT, a new synthesizer dataset with a task-specific rendering pipeline covering 250 timbres, 120 ADSR envelopes, and 100 MIDI sequences. Experiments show that SynthCloner outperforms baselines on both objective and subjective metrics, while enabling independent attribute control. The code, model checkpoint, and audio examples are available atthis https URL.

View on arXiv

Comments on this paper