Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie DubbingComputer Vision and Pattern Recognition (CVPR), 2025 |
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
with Distilled Time-Varying Style DiffusionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
PRESENT: Zero-Shot Text-to-Prosody ControlIEEE Signal Processing Letters (SPL), 2024 |
Llama-VITS: Enhancing TTS Synthesis with Semantic AwarenessInternational Conference on Language Resources and Evaluation (LREC), 2024 |
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of
SSWPInterspeech (Interspeech), 2023 |
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023 |
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations
for Text-to-SpeechInterspeech (Interspeech), 2023 |