Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.02882
Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"
50 / 617 papers shown
Title
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
70
16
0
25 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
52
11
0
24 Sep 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Shunwei Lei
Yixuan Zhou
Liyang Chen
Dan Luo
Zhiyong Wu
...
Shiyin Kang
Tao Jiang
Yahui Zhou
Yuxing Han
Helen M. Meng
VLM
81
2
0
21 Sep 2023
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models
Hsuan Su
Ting-Yao Hu
H. Koppula
Raviteja Vemulapalli
Jen-Hao Rick Chang
Karren D. Yang
G. Mantena
Oncel Tuzel
SyDa
68
3
0
18 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
88
4
0
18 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
Huatian Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
83
3
0
16 Sep 2023
Fewer-token Neural Speech Codec with Time-invariant Codes
Yong Ren
Tao Wang
Jiangyan Yi
Le Xu
Jianhua Tao
Chuyuan Zhang
Jun Zhou
75
36
0
15 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
95
25
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
71
3
0
15 Sep 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
76
3
0
14 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqiang Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
49
3
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
99
63
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
123
69
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
82
27
0
14 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
87
8
0
13 Sep 2023
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Xiaoxiao Miao
Xin Eric Wang
Erica Cooper
Junichi Yamagishi
Nicholas W. D. Evans
Massimiliano Todisco
J. Bonastre
Mickael Rouvier
62
5
0
12 Sep 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Yiwei Guo
Chenpeng Du
Ziyang Ma
Xie Chen
K. Yu
DiffM
103
47
0
10 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
63
2
0
08 Sep 2023
A Two-Stage Training Framework for Joint Speech Compression and Enhancement
Jiayi Huang
Zeyu Yan
Wenbin Jiang
Fei Wen
49
1
0
08 Sep 2023
Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
55
1
0
06 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
71
11
0
06 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
53
2
0
06 Sep 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
78
1
0
30 Aug 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
68
1
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
74
3
0
28 Aug 2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Ziyue Jiang
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
86
48
0
28 Aug 2023
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired
Jia-Jyu Su
Pang-Chen Liao
Yen-Ting Lin
Wu-Hao Li
Guan-Ting Liou
...
Wei-Cheng Chen
Jen-Chieh Chiang
Wen-Yang Chang
Pin-Han Lin
Chen-Yu Chiang
54
1
0
27 Aug 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wen Wang
Yang Song
S. Jha
69
8
0
24 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
89
1
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
102
16
0
31 Jul 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
87
12
0
30 Jul 2023
Adaptation of Whisper models to child speech recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Peter Corcoran
H. Cucu
46
34
0
24 Jul 2023
Vocoder drift compensation by x-vector alignment in speaker anonymisation
Michele Panariello
Massimiliano Todisco
Nicholas W. D. Evans
65
2
0
17 Jul 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao
Qi-ping Yuan
Yibo Duan
Zhuo Chen
30
2
0
03 Jul 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
73
22
0
28 Jun 2023
Two-Stage Voice Anonymization for Enhanced Privacy
F. Nespoli
Daniel Barreda
Joerg Bitzer
Patrick A. Naylor
58
3
0
28 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
69
4
0
27 Jun 2023
CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation
Yuhao Cui
Xiongwei Wang
Zhongzhou Zhao
Wei Zhou
Haiqing Chen
57
1
0
27 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
88
6
0
25 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
121
306
0
23 Jun 2023
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Zhichao Wang
Yuan-Jui Chen
Linfu Xie
Qiao Tian
Yuping Wang
156
32
0
18 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
141
126
0
13 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
102
44
0
13 Jun 2023
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
Hao Liu
Tao Wang
Jie Cao
Ran He
J. Tao
DiffM
68
4
0
09 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
85
6
0
07 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
53
2
0
02 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
132
104
0
01 Jun 2023
Speaker anonymization using orthogonal Householder neural network
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
BDL
74
20
0
30 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
106
80
0
30 May 2023
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation
Ambuj Mehrish
Abhinav Ramesh Kashyap
Yingting Li
Navonil Majumder
Soujanya Poria
70
7
0
29 May 2023
Previous
1
2
3
...
6
7
8
...
11
12
13
Next