Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.08093
Cited By
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
12 February 2024
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
Fan Yang
Arnaud Joly
Álvaro Martín-Cortinas
Ammar Abbas
Adam Michalski
Alexis Moinet
S. Karlapati
Ewa Muszyñska
Haohan Guo
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data"
50 / 55 papers shown
Title
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
S. Liu
J. Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
23
0
0
14 Apr 2025
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
51
0
0
11 Apr 2025
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis
Yizhong Geng
Jizhuo Xu
Zeyu Liang
Jinghan Yang
Xiaoyi Shi
Xiaoyu Shen
19
0
0
10 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
39
1
0
03 Apr 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
88
0
0
21 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Y. Wang
Kai Chen
Pengyuan Zhang
Z. Wu
AuLLM
52
3
0
28 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
J. Zhang
Lu Lu
Y. Wang
Haizhou Li
Z. Wu
AuLLM
77
16
0
17 Jan 2025
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge
Dake Guo
J.-H. Yao
Xinfa Zhu
Kangxiang Xia
Zhao Guo
Ziyu Zhang
Y. Wang
Jie Liu
Lei Xie
18
0
0
31 Oct 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
31
5
0
29 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang
Fan Yu
Z. Ma
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
22
1
0
22 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
26
0
0
17 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
30
2
0
16 Oct 2024
Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
Rui Liu
Zhenqi Jia
Jie Yang
Yifan Hu
H. Li
24
2
0
12 Oct 2024
Graded Suspiciousness of Adversarial Texts to Human
Shakila Mahjabin Tonni
Pedro Faustini
Mark Dras
AAML
16
0
0
06 Oct 2024
Zero-Shot Text-to-Speech from Continuous Text Streams
Trung D. Q. Dang
David Aponte
Dung Tran
Tianyi Chen
K. Koishida
AuLLM
VLM
24
3
0
01 Oct 2024
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Haozhe Chen
Run Chen
Julia Hirschberg
19
3
0
01 Oct 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
27
2
0
26 Sep 2024
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Kun Zhou
You Zhang
Shengkui Zhao
Hao Wang
Zexu Pan
...
Chongjia Ni
Yukun Ma
Trung Hieu Nguyen
J. Yip
Bin Ma
44
5
0
25 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Yuan Feng
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
25
4
0
18 Sep 2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
Haohan Guo
Fenglong Xie
Dongchao Yang
Xixin Wu
Helen Meng
36
1
0
18 Sep 2024
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
Ye Bai
Haonan Chen
Jitong Chen
Zhuo Chen
Yi Deng
...
Hang Zhao
Ziyi Zhao
Dejian Zhong
Shicen Zhou
Pei Zou
DiffM
58
6
0
13 Sep 2024
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
Wangjin Zhou
Fengrun Zhang
Yiming Liu
Wenhao Guan
Yi Zhao
He Qu
20
1
0
12 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
32
20
0
05 Sep 2024
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis
Haohan Guo
Fenglong Xie
Kun Xie
Dongchao Yang
Dake Guo
Xixin Wu
Helen Meng
26
4
0
02 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
23
37
0
01 Sep 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
52
0
0
30 Aug 2024
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Zehai Tu
Guangyan Zhang
Yiting Lu
Adaeze Adigwe
Simon King
Yiwen Guo
27
0
0
29 Aug 2024
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Y. Wang
Xie Chen
AuLLM
29
23
0
05 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
31
1
0
01 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
26
4
0
21 Jul 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
Srija Anand
Praveena Varadhan
Ashwin Sankar
Giri Raju
Mitesh M. Khapra
27
1
0
18 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
32
30
0
11 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
34
2
0
04 Jul 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
43
3
0
02 Jul 2024
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn
Ashishkumar Gudmalwar
Nirmesh Shah
Pankaj Wasnik
R. Shah
24
5
0
13 Jun 2024
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
Yue Li
Xinsheng Wang
Li Zhang
Lei Xie
29
1
0
12 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
28
2
0
08 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
29
64
0
08 Jun 2024
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Haohan Guo
Fenglong Xie
Dongchao Yang
Hui Lu
Xixin Wu
Helen Meng
45
6
0
05 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
34
3
0
05 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
J. Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Y. Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
36
74
0
04 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Chen Chen
Yuchen Hu
Wen Wu
Helin Wang
Chng Eng Siong
Chao Zhang
33
10
0
02 Jun 2024
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Wen-Chin Huang
Yi-Chiao Wu
T. Toda
27
1
0
20 May 2024
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Siyang Wang
Éva Székely
31
4
0
16 May 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
24
1
0
30 Apr 2024
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
34
4
0
30 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
23
1
0
10 Apr 2024
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
16
637
0
05 Jan 2023
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Yihan Wu
Xi Wang
S. Zhang
Lei He
Ruihua Song
J. Nie
23
14
0
25 Jun 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
124
339
0
21 May 2022
1
2
Next