v1v2v3v4v5 (latest)

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

25 January 2024

Chenpeng Du

Yiwei Guo

Hankun Wang

Yifan Yang

Zhikang Niu

Shuai Wang

Hui Zhang

Xie Chen

Kai Yu

VLM

ArXiv (abs)PDF HTML Github

Papers citing "VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech"

23 / 23 papers shown

Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward

Guansu Wang

Peijie Sun

12 Nov 2025

AniME: Adaptive Multi-Agent Planning for Long Animation Generation

...

115

26 Aug 2025

Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding

167

30 May 2025

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation

271

26 May 2025

SpeakStream: Streaming Text-to-Speech with Interleaved Data

273

25 May 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

...

398

23 May 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

427

01 May 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

...

410

14 Apr 2025

TTS-Transducer: End-to-End Speech Synthesis with Neural TransducerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

339

10 Jan 2025

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task EditorAAAI Conference on Artificial Intelligence (AAAI), 2024

1.0K

18 Dec 2024

Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative DecodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

412

29 Oct 2024

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative DecodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

322

17 Oct 2024

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

752

340

09 Oct 2024

SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-SpeechIEEE Signal Processing Letters (SPL), 2024

Minchan Kim

Myeonghun Jeong

Joun Yeop Lee

Nam Soo Kim

215

07 Oct 2024

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

Jin Xu

312

28 Sep 2024

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation

Haohan Guo

Fenglong Xie

Dongchao Yang

Xixin Wu

Helen Meng

320

18 Sep 2024

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language ModellingACM Multimedia (MM), 2024

Yixuan Zhou

Xiaoyu Qin

Zeyu Jin

Shuoyi Zhou

Shun Lei

Songtao Zhou

Zhiyong Wu

Jia Jia

AuLLM

388

28 Aug 2024

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Shuai Wang

Zheng-Shou Chen

Kong Aik Lee

Yan-min Qian

Haizhou Li

377

21 Jul 2024

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Sefik Emre Eskimez

Xiaofei Wang

Manthan Thakker

Canrun Li

Chung-Hsien Tsai

...

Min Tang

Xu Tan

Yanqing Liu

Sheng Zhao

Naoyuki Kanda

VLM

327

173

26 Jun 2024

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

Shujie Liu

Yanming Qian

Sheng Zhao

Jinyu Li

Furu Wei

265

12 Jun 2024

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

Hankun Wang

Chenpeng Du

Yiwei Guo

Shuai Wang

Xie Chen

Kai Yu

196

30 Apr 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Detai Xin

Xu Tan

Kai Shen

Zeqian Ju

Dongchao Yang

...

Hiroshi Saruwatari

332

04 Apr 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Puyuan Peng

Po-Yao (Bernie) Huang

Daniel Li

Abdelrahman Mohamed

David Harwath

539

169

25 Mar 2024