ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability

Reverberation encodes spatial information regarding the acoustic source environment, yet traditional Speech Restoration (SR) usually completely removes reverberation. We propose ReverbMiipher, an SR model extending parametric resynthesis framework, designed to denoise speech while preserving and enabling control over reverberation. ReverbMiipher incorporates a dedicated ReverbEncoder to extract a reverb feature vector from noisy input. This feature conditions a vocoder to reconstruct the speech signal, removing noise while retaining the original reverberation characteristics. A stochastic zero-vector replacement strategy during training ensures the feature specifically encodes reverberation, disentangling it from other speech attributes. This learned representation facilitates reverberation control via techniques such as interpolation between features, replacement with features from other utterances, or sampling from a latent space. Objective and subjective evaluations confirm ReverbMiipher effectively preserves reverberation, removes other artifacts, and outperforms the conventional two-stage SR and convolving simulated room impulse response approach. We further demonstrate its ability to generate novel reverberation effects through feature manipulation.
View on arXiv@article{nakata2025_2505.05077, title={ ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability }, author={ Wataru Nakata and Yuma Koizumi and Shigeki Karita and Robin Scheibler and Haruko Ishikawa and Adriana Guevara-Rukoz and Heiga Zen and Michiel Bacchiani }, journal={arXiv preprint arXiv:2505.05077}, year={ 2025 } }