Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.02882
Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"
50 / 617 papers shown
Title
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
67
9
0
10 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
91
5
0
10 Jun 2024
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Chung-Ming Chien
Andros Tjandra
Apoorv Vyas
Matt Le
Bowen Shi
Wei-Ning Hsu
70
0
0
10 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
100
84
0
07 Jun 2024
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Wangyou Zhang
Robin Scheibler
Kohei Saijo
Samuele Cornell
Chenda Li
...
Jan Pirklbauer
Marvin Sach
Shinji Watanabe
Tim Fingscheidt
Yanmin Qian
VLM
82
20
0
07 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
95
4
0
06 Jun 2024
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Ahad Jawaid
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
MoE
94
1
0
05 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
86
6
0
05 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
144
29
0
04 Jun 2024
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Kun Zhou
Shengkui Zhao
Yukun Ma
Chong Zhang
Hao Wang
Dianwen Ng
Chongjia Ni
Nguyen Trung Hieu
J. Yip
Bin Ma
63
5
0
04 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Chen Chen
Yuchen Hu
Wen Wu
Helin Wang
Chng Eng Siong
Chao Zhang
88
12
0
02 Jun 2024
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Vicky Zayats
Peter Chen
Melissa Ferrari
Dirk Padfield
AI4CE
77
1
0
29 May 2024
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Wen-Chin Huang
Yi-Chiao Wu
Tomoki Toda
66
1
0
20 May 2024
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Siyang Wang
Éva Székely
104
6
0
16 May 2024
SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
You Zhang
Yongyi Zang
Jiatong Shi
Ryuichi Yamamoto
Jionghao Han
Yuxun Tang
Tomoki Toda
Zhiyao Duan
97
5
0
08 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
86
21
0
08 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
91
29
0
30 Apr 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
51
2
0
30 Apr 2024
Deep low-latency joint speech transmission and enhancement over a gaussian channel
Mohammad Bokaei
Jesper Jensen
Simon Doclo
Jan Østergaard
51
0
0
30 Apr 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
75
12
0
28 Apr 2024
An automatic mixing speech enhancement system for multi-track audio
Xiaojing Liu
Angeliki Mourgela
Hongwei Ai
Joshua D. Reiss
28
1
0
27 Apr 2024
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Yicheng Gu
Xueyao Zhang
Liumeng Xue
Haizhou Li
Zhizheng Wu
45
3
0
26 Apr 2024
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Xinlei Niu
Jing Zhang
Charles Patrick Martin
56
3
0
24 Apr 2024
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Sen Liu
Yiwei Guo
Xie Chen
Kai Yu
46
2
0
23 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
130
1
0
16 Apr 2024
The Impact of Speech Anonymization on Pathology and Its Limits
Soroosh Tayebi Arasteh
T. Arias-Vergara
Paula Andrea Pérez-Toro
Tobias Weise
Kai Packhaeuser
Maria Schuster
E. Noeth
Andreas Maier
Seung Hee Yang
92
7
0
11 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
88
1
0
09 Apr 2024
Cross-Domain Audio Deepfake Detection: Dataset and Analysis
Yuang Li
Min Zhang
Mengxin Ren
Miaomiao Ma
Daimeng Wei
Hao Yang
67
9
0
07 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Yingting Li
Rishabh Bhardwaj
Ambuj Mehrish
Bo Cheng
Soujanya Poria
63
2
0
06 Apr 2024
Dynamic Switch Layers For Unsupervised Learning
Haiguang Li
Usama Pervaiz
Michal Matuszak
Robert Kamara
Gilles Roux
T. Thormundsson
Joseph Antognini
124
1
0
05 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
76
28
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
122
44
0
03 Apr 2024
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders
Yu Pan
Lei Ma
Jianjun Zhao
87
6
0
03 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
125
25
0
03 Apr 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
57
6
0
31 Mar 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
133
79
0
25 Mar 2024
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
50
0
0
25 Mar 2024
Building speech corpus with diverse voice characteristics for its prompt-based representation
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
65
1
0
20 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
134
3
0
08 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
58
2
0
06 Mar 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
Ho-Lam Chung
Yi-Cheng Lin
Yuan-Kuei Wu
Xuanjun Chen
Yu-Chi Pai
Hsiu-Hsuan Wang
Kai-Wei Chang
Alexander H. Liu
Hung-yi Lee
107
29
0
20 Feb 2024
SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech
Adam Sabra
C. Wronka
Michelle Mao
Samer Hijazi
36
1
0
19 Feb 2024
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
Miri Varshavsky-Hassid
Roy Hirsch
Regev Cohen
Tomer Golany
Daniel Freedman
Ehud Rivlin
67
3
0
19 Feb 2024
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Ziyue Jiang
Dingdong Wang
Hanting Wang
Jialung Zuo
Shulei Wang
AuLLM
94
0
0
19 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
74
16
0
14 Feb 2024
Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations
Álvaro Martín-Cortinas
Daniel Sáez-Trigueros
Iván Vallés-Pérez
Biel Tura Vecino
Piotr Bilinski
Mateusz Lajszczak
Grzegorz Beringer
Roberto Barra-Chicote
Jaime Lorenzo-Trueba
53
6
0
05 Feb 2024
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Panos Kakoulidis
Nikolaos Ellinas
G. Vamvoukakis
Myrsini Christidou
Alexandra Vioni
...
Junkwang Oh
Gunu Jho
Inchul Hwang
Pirros Tsiakoulis
Aimilios Chalamandaris
52
1
0
02 Feb 2024
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
Linping Xu
Jiawei Jiang
Dejun Zhang
Xianjun Xia
Li Chen
Yijian Xiao
Piao Ding
Shenyi Song
Sixing Yin
Ferdous Sohel
57
7
0
02 Feb 2024
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
162
2
0
02 Feb 2024
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
93
13
0
01 Feb 2024
Previous
1
2
3
4
5
6
...
11
12
13
Next