v1v2v3 (latest)

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

International Conference on Learning Representations (ICLR), 2024

1 September 2024

Yuancheng Wang

Zhizheng Wu

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)Github (9101★)

Papers citing "MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer"

12 / 62 papers shown

SepALM: Audio Language Models Are Error Correctors for Robust Speech SeparationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

421

06 May 2025

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing

975

02 May 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

...

357

14 Apr 2025

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

293

07 Mar 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

...

278

101

03 Mar 2025

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

...

584

26 Feb 2025

SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer

294

16 Feb 2025

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech GenerationIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

...

365

27 Jan 2025

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice EnhancementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

473

26 Jan 2025

EmoDubber: Towards High Quality and Emotion Controllable Movie DubbingComputer Vision and Pattern Recognition (CVPR), 2024

623

12 Dec 2024

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

609

269

09 Oct 2024

FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

Xu Tang

Kun Xie

Kai-Tuo Xu

332

05 Sep 2024