Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2401.14321
Cited By
v1
v2
v3
v4
v5 (latest)
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
25 January 2024
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech"
23 / 23 papers shown
Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
Guansu Wang
Peijie Sun
90
0
0
12 Nov 2025
AniME: Adaptive Multi-Agent Planning for Long Animation Generation
Lisai Zhang
Baohan Xu
Siqian Yang
Mingyu Yin
Jing Liu
...
Yidi Wu
Y. Hong
Zihao Zhang
Yanzhang Liang
Yudong Jiang
AI4CE
115
2
0
26 Aug 2025
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Yu Xi
Xiaoyu Gu
Haoyu Li
Jun Song
Bo Zheng
Kai Yu
167
1
0
30 May 2025
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Puyuan Peng
Shang-Wen Li
Abdelrahman Mohamed
David Harwath
271
0
0
26 May 2025
SpeakStream: Streaming Text-to-Speech with Interleaved Data
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
AuLLM
AI4TS
273
5
0
25 May 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
398
94
0
23 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
427
6
0
01 May 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
Shixuan Liu
Jiajian Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
410
8
0
14 Apr 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
339
4
0
10 Jan 2025
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
AAAI Conference on Artificial Intelligence (AAAI), 2024
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Yongjun Xu
Yizhi Zhou
Haina Zhu
Haoyang Li
KELM
1.0K
8
0
18 Dec 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
412
18
0
29 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
322
9
0
17 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
752
340
0
09 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
IEEE Signal Processing Letters (SPL), 2024
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
215
1
0
07 Oct 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
312
5
0
28 Sep 2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
Haohan Guo
Fenglong Xie
Dongchao Yang
Xixin Wu
Helen Meng
320
8
0
18 Sep 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
ACM Multimedia (MM), 2024
Yixuan Zhou
Xiaoyu Qin
Zeyu Jin
Shuoyi Zhou
Shun Lei
Songtao Zhou
Zhiyong Wu
Jia Jia
AuLLM
388
27
0
28 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
377
28
0
21 Jul 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
327
173
0
26 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
265
54
0
12 Jun 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
196
6
0
30 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
332
39
0
04 Apr 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
539
169
0
25 Mar 2024
1
Page 1 of 1