ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.00750
  4. Cited By
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
v1v2v3 (latest)

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

International Conference on Learning Representations (ICLR), 2024
1 September 2024
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)Github (9101★)

Papers citing "MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer"

12 / 62 papers shown
Title
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
SepALM: Audio Language Models Are Error Correctors for Robust Speech SeparationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLMKELMVLM
402
1
0
06 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
Anton Van Den Hengel
Yuankai Qi
Qingming Huang
966
2
0
02 May 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
Shixuan Liu
Jiajian Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
313
6
0
14 Apr 2025
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information
Feng Jiang
Zhiyu Lin
Fan Bu
Yuhao Du
Benyou Wang
Haoyang Li
AuLLMELM
257
9
0
07 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xiang Wang
Mingqi Jiang
Tianhao Shen
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Xu Tan
Wei Xue
265
96
0
03 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLMVLM
550
12
0
26 Feb 2025
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer
Zhengyan Sheng
Zhihao Du
Shiliang Zhang
Zhijie Yan
Yexin Yang
Zhenhua Ling
278
6
0
16 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech GenerationIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Longji Xu
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
348
13
0
27 Jan 2025
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice EnhancementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Junan Zhang
Jing Yang
Zihao Fang
Longji Xu
Zehua Zhang
Zhuo Wang
Fan Fan
Zhikai Wu
DiffM
435
23
0
26 Jan 2025
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
EmoDubber: Towards High Quality and Emotion Controllable Movie DubbingComputer Vision and Pattern Recognition (CVPR), 2024
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
596
15
0
12 Dec 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
578
256
0
09 Oct 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
300
75
0
05 Sep 2024
Previous
12