Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2301.02111
Cited By
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
5 January 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
Shujie Liu
Zhuo Chen
Yanqing Liu
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (22090★)
Papers citing
"Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers"
50 / 611 papers shown
Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction
Teo Guichoux
Théodor Lemerle
Shivam Mehta
Jonas Beskow
G. Henter
Laure Soulier
Catherine Pelachaud
Nicolas Obin
231
0
0
30 Mar 2026
M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis
Xiaopeng Wang
Chunyu Qiang
Ruibo Fu
Zhengqi Wen
Xuefei Liu
...
Yuankun Xie
Heng Xie
Chenxing Li
Chen Zhang
Changsheng Li
DiffM
238
4
0
04 Dec 2025
Q2D2: A Geometry-Aware Audio Codec Leveraging Two-Dimensional Quantization
Tal Shuster
Eliya Nachmani
164
0
0
01 Dec 2025
Harmonic-Percussive Disentangled Neural Audio Codec for Bandwidth Extension
Benoît Giniès
Xiaoyu Bie
Olivier Fercoq
Gaël Richard
230
0
0
26 Nov 2025
Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
Wei-Cheng Tseng
David Harwath
SSL
416
1
0
20 Nov 2025
Multi-modal Deepfake Detection and Localization with FPN-Transformer
Chende Zheng
Ruiqi suo
Zhoulin Ji
Jingyi Deng
Fangbin Yi
Chenhao Lin
Chao Shen
ViT
146
0
0
11 Nov 2025
SynTTS-Commands: A Public Dataset for On-Device KWS via TTS-Synthesized Multilingual Speech
Lu Gan
Xi Li
135
0
0
11 Nov 2025
Step-Audio-EditX Technical Report
Chao Yan
Boyong Wu
Peng Yang
Pengfei Tan
Guoqiang Hu
...
Xiangyu Zhang
Daxin Jiang
Daxin Jiang
Shuchang Zhou
Gang Yu
214
3
0
05 Nov 2025
NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion
Zongyang Du
Shreeram Suresh Chandra
Ismail Rasim Ulgen
Aurosweta Mahapatra
Ali N. Salman
Carlos Busso
Berrak Sisman
201
1
0
31 Oct 2025
Bayesian Speech Synthesizers Can Learn from Multiple Teachers
Ziyang Zhang
Yifan Gao
Xuenan Xu
Baoxiangli
Wen Wu
Chao Zhang
179
0
0
28 Oct 2025
MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration
Junhyuk So
Hyunho Kook
Chaeyeon Jang
Eunhyeok Park
166
1
0
28 Oct 2025
SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity
Hanke Xie
Haopeng Lin
Wenxiao Cao
Dake Guo
WenJie Tian
...
Shunshun Yin
Ming Tao
Xie Chen
Lei Xie
Xinsheng Wang
249
4
0
27 Oct 2025
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
461
10
0
26 Oct 2025
U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
Xusheng Yang
Long Zhou
Wenfu Wang
Kai Hu
Shulin Feng
Chenxing Li
Meng Yu
Dong Yu
Y. Zou
171
1
0
19 Oct 2025
DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation
Yakun Song
Xiaobin Zhuang
Jiawei Chen
Zhikang Niu
Guanrou Yang
...
Zhuo Chen
Yuping Wang
Yuping Wang
Xie Chen
Xie Chen
DiffM
253
3
0
14 Oct 2025
Improving Generative Behavior Cloning via Self-Guidance and Adaptive Chunking
Junhyuk So
Chiwoong Lee
Shinyoung Lee
Jungseul Ok
Eunhyeok Park
AI4CE
193
2
0
14 Oct 2025
Universal Discrete-Domain Speech Enhancement
Fei Liu
Yang Ai
Ye-Xin Lu
Rui Zheng
Hui-Peng Du
Zhen-Hua Ling
186
2
0
11 Oct 2025
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
Hanke Xie
Dake Guo
C. Wang
Yue Li
WenJie Tian
...
Xinsheng Wang
Xiulin Li
Guanqiong Miao
B. Liu
Lei Xie
AuLLM
479
1
0
09 Oct 2025
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Wenhao Guan
Zhikang Niu
Ziyue Jiang
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
Xie Chen
AuLLM
452
0
0
06 Oct 2025
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
Baher Mohammad
Magauiya Zhussip
Stamatios Lefkimmiatis
Mamba
204
1
0
06 Oct 2025
Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
Jahidul Arafat
Fariha Tasmin
Sanjaya Poudel
Ahsan Habib Tareq
FedML
287
0
0
05 Oct 2025
Soft Disentanglement in Frequency Bands for Neural Audio Codecs
Benoit Ginies
Xiaoyu Bie
Olivier Fercoq
Gaël Richard
161
1
0
04 Oct 2025
Désentrelacement Fréquentiel Doux pour les Codecs Audio Neuronaux
Benoît Giniès
Xiaoyu Bie
Olivier Fercoq
Gaël Richard
172
0
0
04 Oct 2025
Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Hieu-Nghia Huynh-Nguyen
Huynh Nguyen Dang
Ngoc Son Nguyen
Van Nguyen
138
0
0
03 Oct 2025
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Jiaqi Li
Y. Qian
Yuxuan Hu
Leying Zhang
Xiaofei Wang
Heng Lu
Manthan Thakker
Jinyu Li
Sheng Zhao
Zhizheng Wu
259
5
0
01 Oct 2025
HiStyle: Hierarchical Style Embedding Predictor for Text-Prompt-Guided Controllable Speech Synthesis
Ziyu Zhang
Hanzhao Li
Jingbin Hu
W. Li
Lei Xie
153
1
0
30 Sep 2025
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
Tianrui Wang
Haoyu Wang
Meng Ge
Cheng Gong
Chunyu Qiang
...
Xiaobao Wang
Eng Siong Chng
Xie Chen
Longbiao Wang
Jianwu Dang
241
2
0
29 Sep 2025
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Yixuan Zhou
Guoyang Zeng
Xin Liu
Xiang Li
Renjie Yu
...
Weiyue Sun
Jiancheng Gui
Kehan Li
Z. Wu
Zhiyuan Liu
223
13
0
29 Sep 2025
Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
Junjie Cao
Yichen Han
Ruonan Zhang
Xiaoyang Hao
Hongxiang Li
Shuaijiang Zhao
Yue Liu
Xiao-Ping Zhng
159
0
0
26 Sep 2025
AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook
Yihao Chen
Kai Hu
Long Zhou
Shulin Feng
Xusheng Yang
Hangting Chen
Xie Chen
219
4
0
26 Sep 2025
ArFake: A Multi-Dialect Benchmark and Baselines for Arabic Spoof-Speech Detection
Mohamed Maged
Alhassan Ehab
Ali Mekky
Besher Hassan
Shady Shehata
179
1
0
26 Sep 2025
AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit
Yi Zhu
Heitor R. Guimarães
Arthur Pimentel
Tiago H. Falk
148
0
0
25 Sep 2025
SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
T. Nguyen
Jaehun Kim
Ji-Hoon Kim
Shukjae Choi
Youshin Lim
Joon Son Chung
207
2
0
25 Sep 2025
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Tianqiao Liu
Xueyi Li
Hao Wang
Haoxuan Li
Zhichao Chen
Weiqi Luo
Zitao Liu
AuLLM
207
3
0
24 Sep 2025
Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
Ismail Rasim Ulgen
Zongyang Du
Junchen Lu
Philipp Koehn
Berrak Sisman
190
0
0
24 Sep 2025
WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
Binbin Zhang
Chengdong Liang
Shuai Wang
Xuelong Geng
Zhao Guo
...
Hao Yin
XiPeng Yang
Pengshen Zhang
Changwei Ma
Lei Xie
AuLLM
VLM
515
1
0
24 Sep 2025
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances
Junhyeok Lee
Helin Wang
Yaohan Guan
Thomas Thebaud
Laureano Moro-Velazquez
Jesus Villalba
Najim Dehak
146
1
0
21 Sep 2025
MBCodec:Thorough disentangle for high-fidelity audio compression
Ruonan Zhang
Xiaoyang Hao
Yichen Han
Junjie Cao
Yue Liu
Kai Zhang
243
3
0
21 Sep 2025
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
Nikita Torgashov
Gustav Eje Henter
Gabriel Skantze
VLM
246
4
0
19 Sep 2025
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
Luca Della Libera
Cem Subakan
Mirco Ravanelli
161
4
0
19 Sep 2025
Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
Qingyu Liu
Yihao Chen
Zhikang Niu
Chunhui Wang
Yunting Yang
Bowen Zhang
Jian Zhao
Pengcheng Zhu
K. Yu
Xie Chen
180
1
0
18 Sep 2025
DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Ye-Xin Lu
Yu Gu
Kun Wei
Hui-Peng Du
Yang Ai
Zhen-Hua Ling
DiffM
231
0
0
18 Sep 2025
Neural Audio Codecs for Prompt-Driven Universal Sound Separation
Adhiraj Banerjee
Vipul Arora
VLM
287
0
0
15 Sep 2025
FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
Md Mubtasim Ahasan
Rafat Hasan Khan
Tasnim Mohiuddin
Vasu Sharma
Tariq Iqbal
M. A. Amin
Amin Ahsan Ali
M. Islam
A. K. M. Mahbubur Rahman
335
1
0
14 Sep 2025
Length-Aware Rotary Position Embedding for Text-Speech Alignment
Hyeongju Kim
Juheon Lee
Jinhyeok Yang
Jacob Morton
AuLLM
128
1
0
14 Sep 2025
GmSLM : Generative Marmoset Spoken Language Modeling
Talia Sternberg
Michael London
David Omer
Yossi Adi
AuLLM
238
0
0
11 Sep 2025
DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners
Xiaoxue Luo
Jinwei Huang
Runyan Yang
Yingying Gao
Junlan Feng
Chao Deng
Shilei Zhang
234
3
0
11 Sep 2025
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
Yuhao Zhang
Yuhao Du
Zhanchen Dai
Xiangnan Ma
Kaiqi Kou
Benyou Wang
Haizhou Li
128
3
0
11 Sep 2025
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Harry Julian
Rachel Beeson
Lohith Konathala
Johanna Ulin
Jiameng Gao
235
1
0
11 Sep 2025
Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling
Neil Zeghidour
Eugene Kharitonov
Manu Orsini
Václav Volhejn
Gabriel de Marmiesse
Edouard Grave
P. Pérez
Laurent Mazaré
Alexandre Défossez
OffRL
319
20
0
10 Sep 2025
1
2
3
4
...
11
12
13
Next
Page 1 of 13
Page
of 13
Go