ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.02882
  4. Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"

50 / 617 papers shown
Title
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
80
4
0
31 Jan 2024
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative
  Adversarial Networks
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Shijia Liao
Shiyi Lan
Arun George Zachariah
36
1
0
31 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
117
30
0
25 Jan 2024
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot
  TTS to Indic Languages
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
Akshit Arora
Rohan Badlani
Sungwon Kim
Rafael Valle
Bryan Catanzaro
23
0
0
24 Jan 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
  Self-Supervised Representation Mixing and Embedding Initialization
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
101
0
0
23 Jan 2024
Adversarial speech for voice privacy protection from Personalized Speech
  generation
Adversarial speech for voice privacy protection from Personalized Speech generation
Shihao Chen
Liping Chen
Jie Zhang
KongAik Lee
Zhenhua Ling
Lirong Dai
AAML
48
1
0
22 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
Xin Li
Luisa Verdoliva
Shu Hu
207
64
0
22 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
123
1
0
16 Jan 2024
Towards High-Quality and Efficient Speech Bandwidth Extension with
  Parallel Amplitude and Phase Prediction
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
Ye-Xin Lu
Yang Ai
Hui-Peng Du
Zhenhua Ling
90
10
0
12 Jan 2024
High-precision Voice Search Query Correction via Retrievable Speech-text
  Embedings
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings
Christopher Li
Gary Wang
Kyle Kastner
Heng Su
Allen Chen
...
Zelin Wu
L. Velikovich
Pat Rondon
D. Caseiro
Petar S. Aleksic
53
1
0
08 Jan 2024
Transfer the linguistic representations from TTS to accent conversion
  with non-parallel data
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen
Jiakun Pei
Liumeng Xue
Mingyang Zhang
85
5
0
07 Jan 2024
StreamVC: Real-Time Low-Latency Voice Conversion
StreamVC: Real-Time Low-Latency Voice Conversion
Yang Yang
Y. Kartynnik
Yunpeng Li
Jiuqiang Tang
Xing Li
George Sung
Matthias Grundmann
93
15
0
05 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
78
7
0
05 Jan 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker
  Representations
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
68
2
0
04 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic
  Token Prediction
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
48
4
0
03 Jan 2024
Efficient Parallel Audio Generation using Group Masked Language Modeling
Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong
Minchan Kim
Joun Yeop Lee
Nam Soo Kim
52
6
0
02 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
124
94
0
25 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
116
25
0
22 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive
  Text-to-Speech Synthesis
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
72
14
0
17 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
91
35
0
15 Dec 2023
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross
  Attention
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li
Yiwei Guo
Xie Chen
Kai Yu
109
18
0
14 Dec 2023
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue
  State Tracking
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Jihyun Lee
Yejin Jeon
Wonjun Lee
Yunsu Kim
Gary Geunbae Lee
39
1
0
04 Dec 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual
  Deepfakes
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
Pavel Korshunov
Haolin Chen
Philip N. Garner
S´ebastien Marcel
CVBM
111
4
0
29 Nov 2023
Quantifying the redundancy between prosody and text
Quantifying the redundancy between prosody and text
Lukas Wolf
Tiago Pimentel
Evelina Fedorenko
Ryan Cotterell
Alex Warstadt
Ethan Gotlieb Wilcox
Tamar I. Regev
75
11
0
28 Nov 2023
StyleCap: Automatic Speaking-Style Captioning from Speech Based on
  Speech and Language Self-supervised Learning Models
StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models
Kazuki Yamauchi
Yusuke Ijima
Yuki Saito
66
9
0
28 Nov 2023
Multi-Scale Sub-Band Constant-Q Transform Discriminator for
  High-Fidelity Vocoder
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder
Yicheng Gu
Xueyao Zhang
Liumeng Xue
Zhizheng Wu
72
12
0
25 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
125
37
0
21 Nov 2023
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker
  Verification Loss for Noise Robustness
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
Vikentii Pankov
Valeria Pronina
Alexander Kuzmin
Maksim Borisov
Nikita Usoltsev
Xingshan Zeng
Alexander Golubkov
Nikolai Ermolenko
Aleksandra Shirshova
Yulia Matveeva
52
5
0
16 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Muhammad Ali Farooq
Dan Bigioi
Rishabh Jain
Wang Yao
Mariam Yiwere
Peter Corcoran
86
0
0
08 Nov 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust
  Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
72
30
0
08 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
46
0
0
07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic
  Token Prediction
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
95
12
0
06 Nov 2023
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
49
1
0
26 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
102
36
0
25 Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens
Acoustic BPE for Speech Generation with Discrete Tokens
Feiyu Shen
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
76
13
0
23 Oct 2023
A High Fidelity and Low Complexity Neural Audio Coding
A High Fidelity and Low Complexity Neural Audio Coding
Wenzhe Liu
Wei Xiao
Meng Wang
Shan Yang
Yupeng Shi
Yuyong Kang
Dan Su
Shidong Shang
Dong Yu
46
2
0
17 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
50
3
0
14 Oct 2023
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language
  Models
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLMVLMCoGe
122
26
0
12 Oct 2023
Crowdsourced and Automatic Speech Prominence Estimation
Crowdsourced and Automatic Speech Prominence Estimation
Max Morrison
P. Pawar
Nathan Pruyne
Jennifer Cole
Bryan Pardo
74
5
0
12 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training
  Data for Automatic Speech Recognition
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
54
3
0
12 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
145
17
0
11 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
79
3
0
09 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech
  and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge
  2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
64
7
0
08 Oct 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing
  for SVCC2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023
Yi-Hua Zhou
Meng Chen
Yi Lei
Jihua Zhu
Weifeng Zhao
56
5
0
08 Oct 2023
uSee: Unified Speech Enhancement and Editing with Conditional Diffusion
  Models
uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models
Muqiao Yang
Chunlei Zhang
Yong-mei Xu
Zhongweiyang Xu
Heming Wang
Bhiksha Raj
Dong Yu
DiffM
75
7
0
02 Oct 2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi
Roshan S. Sharma
Hira Dhamyal
Soumi Maiti
Bhiksha Raj
Rita Singh
72
4
0
01 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBMAuLLM
105
128
0
01 Oct 2023
GeRA: Label-Efficient Geometrically Regularized Alignment
GeRA: Label-Efficient Geometrically Regularized Alignment
Dustin Klebe
Tal Shnitzer
Mikhail Yurochkin
Leonid Karlinsky
Justin Solomon
71
2
0
01 Oct 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Xin Wang
Taein Kwon
Wei-Ning Hsu
Yossi Adi
Tu Nguyen
D. Bohus
Emmanuel Dupoux
Neel Joshi
Abdelrahman Mohamed
42
4
0
29 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
124
82
0
25 Sep 2023
Previous
123...567...111213
Next