ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.02882
  4. Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"

50 / 617 papers shown
Title
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech
  Data
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
70
16
0
25 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics
  Description for Prompt-based Control
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
52
11
0
24 Sep 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with
  Multi-Scale Acoustic Prompts
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Shunwei Lei
Yixuan Zhou
Liyang Chen
Dan Luo
Zhiyong Wu
...
Shiyin Kang
Tao Jiang
Yahui Zhou
Yuxing Han
Helen M. Meng
VLM
81
2
0
21 Sep 2023
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large
  Language Models
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models
Hsuan Su
Ting-Yao Hu
H. Koppula
Raviteja Vemulapalli
Jen-Hao Rick Chang
Karren D. Yang
G. Mantena
Oncel Tuzel
SyDa
68
3
0
18 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
88
4
0
18 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained
  Generative Methods for Speech Enhancement in Adverse Conditions
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
Huatian Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
83
3
0
16 Sep 2023
Fewer-token Neural Speech Codec with Time-invariant Codes
Fewer-token Neural Speech Codec with Time-invariant Codes
Yong Ren
Tao Wang
Jiangyan Yi
Le Xu
Jianhua Tao
Chuyuan Zhang
Jun Zhou
75
36
0
15 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech
  Using Natural Language Descriptions
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
95
25
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic
  and acoustic features
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
71
3
0
15 Sep 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
76
3
0
14 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqiang Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
49
3
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
99
63
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLMAuLLM
123
69
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
82
27
0
14 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in
  Speech Waveforms
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
87
8
0
13 Sep 2023
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Xiaoxiao Miao
Xin Eric Wang
Erica Cooper
Junichi Yamagishi
Nicholas W. D. Evans
Massimiliano Todisco
J. Bonastre
Mickael Rouvier
62
5
0
12 Sep 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Yiwei Guo
Chenpeng Du
Ziyang Ma
Xie Chen
K. Yu
DiffM
103
47
0
10 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
63
2
0
08 Sep 2023
A Two-Stage Training Framework for Joint Speech Compression and
  Enhancement
A Two-Stage Training Framework for Joint Speech Compression and Enhancement
Jiayi Huang
Zeyu Yan
Wenbin Jiang
Fei Wen
49
1
0
08 Sep 2023
Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model
  with Frame-level Prosody Feature
Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
55
1
0
06 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial
  Network
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
71
11
0
06 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any
  Voice Conversion using Only Speech Data
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
53
2
0
06 Sep 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
78
1
0
30 Aug 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
68
1
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
74
3
0
28 Aug 2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language
  Text-to-Speech Models
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Ziyue Jiang
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
86
48
0
28 Aug 2023
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing
  Personalized TTS Systems for the Speech Impaired
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired
Jia-Jyu Su
Pang-Chen Liao
Yen-Ting Lin
Wu-Hao Li
Guan-Ting Liou
...
Wei-Cheng Chen
Jen-Chieh Chiang
Wen-Yang Chang
Pin-Han Lin
Chen-Yu Chiang
54
1
0
27 Aug 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with
  Disentangled Representations
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wen Wang
Yang Song
S. Jha
69
8
0
24 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGenDiffM
89
1
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive
  Speech Synthesis with Prosody Conditional Adversarial Training
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
102
16
0
31 Jul 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
87
12
0
30 Jul 2023
Adaptation of Whisper models to child speech recognition
Adaptation of Whisper models to child speech recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Peter Corcoran
H. Cucu
46
34
0
24 Jul 2023
Vocoder drift compensation by x-vector alignment in speaker
  anonymisation
Vocoder drift compensation by x-vector alignment in speaker anonymisation
Michele Panariello
Massimiliano Todisco
Nicholas W. D. Evans
65
2
0
17 Jul 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD
  Challenge 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao
Qi-ping Yuan
Yibo Duan
Zhuo Chen
30
2
0
03 Jul 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
73
22
0
28 Jun 2023
Two-Stage Voice Anonymization for Enhanced Privacy
Two-Stage Voice Anonymization for Enhanced Privacy
F. Nespoli
Daniel Barreda
Joerg Bitzer
Patrick A. Naylor
58
3
0
28 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
69
4
0
27 Jun 2023
CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion
  Intensity Regulation
CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation
Yuhao Cui
Xiongwei Wang
Zhongzhou Zhao
Wei Zhou
Haiqing Chen
57
1
0
27 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
88
6
0
25 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
121
306
0
23 Jun 2023
LM-VC: Zero-shot Voice Conversion via Speech Generation based on
  Language Models
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Zhichao Wang
Yuan-Jui Chen
Linfu Xie
Qiao Tian
Yuping Wang
156
32
0
18 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLMDiffM
141
126
0
13 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with
  Contextual VQ-Diffusion and Vocoding
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
102
44
0
13 Jun 2023
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
Hao Liu
Tao Wang
Jie Cao
Ran He
J. Tao
DiffM
68
4
0
09 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and
  Diffusion Bridge
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
85
6
0
07 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
53
2
0
02 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural
  vocoders for high-quality audio synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
132
104
0
01 Jun 2023
Speaker anonymization using orthogonal Householder neural network
Speaker anonymization using orthogonal Householder neural network
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
BDL
74
20
0
30 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
106
80
0
30 May 2023
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for
  Low-Resource TTS Adaptation
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation
Ambuj Mehrish
Abhinav Ramesh Kashyap
Yingting Li
Navonil Majumder
Soujanya Poria
70
7
0
29 May 2023
Previous
123...678...111213
Next