ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.10289
90
0

RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding

12 June 2025
Yisi Liu
Chenyang Wang
Hanjo Kim
Raniya Khan
Gopala Anumanchipalli
ArXiv (abs)PDFHTML
Main:6 Pages
5 Figures
Bibliography:3 Pages
1 Tables
Abstract

Voice conversion has emerged as a pivotal technology in numerous applications ranging from assistive communication to entertainment. In this paper, we present RT-VC, a zero-shot real-time voice conversion system that delivers ultra-low latency and high-quality performance. Our approach leverages an articulatory feature space to naturally disentangle content and speaker characteristics, facilitating more robust and interpretable voice transformations. Additionally, the integration of differentiable digital signal processing (DDSP) enables efficient vocoding directly from articulatory features, significantly reducing conversion latency. Experimental evaluations demonstrate that, while maintaining synthesis quality comparable to the current state-of-the-art (SOTA) method, RT-VC achieves a CPU latency of 61.4 ms, representing a 13.3\% reduction in latency.

View on arXiv
@article{liu2025_2506.10289,
  title={ RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding },
  author={ Yisi Liu and Chenyang Wang and Hanjo Kim and Raniya Khan and Gopala Anumanchipalli },
  journal={arXiv preprint arXiv:2506.10289},
  year={ 2025 }
}
Comments on this paper