Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.02882
Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"
50 / 617 papers shown
Title
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
59
4
0
28 May 2023
An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
Fei Kong
Jinhao Duan
Ruipeng Ma
Hengtao Shen
Xiao-lan Zhu
Xiaoshuang Shi
Kaidi Xu
DiffM
53
34
0
26 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
70
1
0
26 May 2023
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
77
35
0
25 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
Sung Ju Hwang
Eunho Yang
DiffM
87
19
0
23 May 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Ziyue Jiang
Qiang Yang
Jia-li Zuo
Zhe Ye
Rongjie Huang
Yixiang Ren
Zhou Zhao
DiffM
97
17
0
23 May 2023
EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels
K. A. Noriy
Xiaosong Yang
Jian Jun Zhang
47
5
0
22 May 2023
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
77
7
0
18 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Won Jang
D. Lim
Heayoung Park
83
1
0
18 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
59
19
0
18 May 2023
Better speech synthesis through scaling
James Betker
CLIP
72
73
0
12 May 2023
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion
Zhichao Wang
Liumeng Xue
Qiuqiang Kong
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
BDL
91
3
0
12 May 2023
Extending Audio Masked Autoencoders Toward Audio Restoration
Zhi-Wei Zhong
Hao Shi
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
67
6
0
11 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
59
1
0
09 May 2023
Accented Text-to-Speech Synthesis with Limited Data
Xuehao Zhou
Mingyang Zhang
Yi Zhou
Zhizheng Wu
Haizhou Li
71
15
0
08 May 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
72
45
0
25 Apr 2023
What does BERT learn about prosody?
Sofoklis Kakouros
Johannah O'Mahony
MILM
50
6
0
25 Apr 2023
DiffVoice: Text-to-Speech with Latent Diffusion
Zhijun Liu
Yiwei Guo
K. Yu
DiffM
103
25
0
23 Apr 2023
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
47
1
0
19 Apr 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
104
247
0
18 Apr 2023
ArmanTTS single-speaker Persian dataset
Mohammd Hasan Shamgholi
Vahid Saeedi
J. Peymanfard
Leila Alhabib
Hossein Zeinali
48
2
0
07 Apr 2023
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Chenpeng Du
Qi Chen
Xie Chen
K. Yu
DiffM
121
51
0
30 Mar 2023
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages
Seong-Hyun Park
Myungseo Song
Bohyung Kim
Tae-Hyun Oh
32
1
0
28 Mar 2023
Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
Karren D. Yang
Ting-Yao Hu
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
73
14
0
27 Mar 2023
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
78
3
0
21 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
102
7
0
06 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model
Haolin Chen
Philip N. Garner
DiffM
61
1
0
03 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
87
29
0
03 Mar 2023
Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition
P. Klumpp
Pooja Chitkara
Leda Sari
Prashant Serai
Jilong Wu
Irina-Elena Veliche
Rongqing Huang
Qing He
53
4
0
01 Mar 2023
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
78
15
0
27 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
94
31
0
27 Feb 2023
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Ke Wang
Tomoki Koriyama
Yuki Saito
Takaaki Saeki
Detai Xin
Hiroshi Saruwatari
60
7
0
27 Feb 2023
Contrast-PLC: Contrastive Learning for Packet Loss Concealment
Huaying Xue
Xiulian Peng
Yan Lu
76
4
0
26 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
161
3
0
18 Feb 2023
Speaker-Independent Acoustic-to-Articulatory Speech Inversion
Peter Wu
Li-Wei Chen
Cheol Jun Cho
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
112
29
0
14 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
101
206
0
07 Feb 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
233
344
0
30 Jan 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
101
18
0
30 Jan 2023
On granularity of prosodic representations in expressive text-to-speech
Mikolaj Babianski
Kamil Pokora
Raahil Shah
Rafał Sienkiewicz
Daniel Korzekwa
V. Klimkov
49
6
0
26 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
47
10
0
22 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Hao Liu
Tao Wang
Ruibo Fu
Jiangyan Yi
Zhengqi Wen
J. Tao
104
3
0
10 Jan 2023
SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain
Heli Qi
Sashi Novitasari
Andros Tjandra
S. Sakti
Satoshi Nakamura
70
3
0
08 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
193
727
0
05 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
97
23
0
30 Dec 2022
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
78
19
0
29 Dec 2022
Voice conversion with limited data and limitless data augmentations
Olga Slizovskaia
Jordi Janer
Pritish Chandna
Oscar Mayor
52
1
0
27 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
69
1
0
11 Dec 2022
DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech
Kazuki Kawamura
Jun Rekimoto
90
0
0
08 Dec 2022
Learning the joint distribution of two sequences using little or no paired data
Soroosh Mariooryad
Matt Shannon
Siyuan Ma
Tom Bagby
David Kao
Daisy Stanton
Eric Battenberg
RJ Skerry-Ryan
78
2
0
06 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
104
13
0
30 Nov 2022
Previous
1
2
3
...
7
8
9
...
11
12
13
Next