ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.02882
  4. Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"

50 / 617 papers shown
Title
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion
  of Whispered and Regular Speech
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
75
0
0
21 Aug 2024
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Zhijun Jia
Huaying Xue
Xiulian Peng
Yan Lu
145
3
0
19 Aug 2024
Accelerating High-Fidelity Waveform Generation via Adversarial Flow
  Matching Optimization
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
AI4TS
85
2
0
15 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OODDiffMAI4TS
102
6
0
14 Aug 2024
Advancing Spatio-Temporal Processing in Spiking Neural Networks through Adaptation
Advancing Spatio-Temporal Processing in Spiking Neural Networks through Adaptation
Maximilian Baronig
Romain Ferrand
Silvester Sabathiel
Robert Legenstein
81
6
0
14 Aug 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis
  Vocoders
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
54
0
0
13 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
85
5
0
12 Aug 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Xiaoxiao Miao
Yuxiang Zhang
Xin Wang
N. Tomashenko
D. Soh
Ian Mcloughlin
109
2
0
12 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
L. Wang
Jianwu Dang
J. Tao
AI4TS
92
0
0
11 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
57
2
0
08 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Yansen Wang
Xie Chen
AuLLM
99
28
0
05 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
65
0
0
05 Aug 2024
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End
  Modeling with LM Knowledge Distillation
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Masato Mimura
Takatomo Kano
A. Ogawa
Marc Delcroix
61
2
0
01 Aug 2024
Generative Expressive Conversational Speech Synthesis
Generative Expressive Conversational Speech Synthesis
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
119
6
0
31 Jul 2024
Towards Improving NAM-to-Speech Synthesis Intelligibility using
  Self-Supervised Speech Models
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
N. Shah
Shirish S. Karande
Vineet Gandhi
67
1
0
26 Jul 2024
Coupling Speech Encoders with Downstream Text Models
Coupling Speech Encoders with Downstream Text Models
Ciprian Chelba
J. Schalkwyk
AuLLM
67
0
0
24 Jul 2024
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech
  SpeechT5 Model
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
Jan Lehecka
Z. Hanzlícek
J. Matousek
Daniel Tihelka
66
0
0
24 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
110
6
0
22 Jul 2024
TTSDS -- Text-to-Speech Distribution Score
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
78
0
0
17 Jul 2024
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural
  Network
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
Kexin Wang
Jiahong Zhang
Yong Ren
Man Yao
Richard D. Shang
Boxing Xu
Guoqi Li
DiffM
59
2
0
17 Jul 2024
Target conversation extraction: Source separation using turn-taking
  dynamics
Target conversation extraction: Source separation using turn-taking dynamics
Tuochao Chen
Qirui Wang
Bohan Wu
Malek Itani
Sefik Emre Eskimez
Takuya Yoshioka
Shyamnath Gollakota
71
6
0
15 Jul 2024
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio
  Synthesis
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
Weizhi Liu
Yue Li
Dongdong Lin
Hui Tian
Haizhou Li
WIGM
103
10
0
15 Jul 2024
Speech Slytherin: Examining the Performance and Efficiency of Mamba for
  Speech Separation, Recognition, and Synthesis
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
Xilin Jiang
Yinghao Aaron Li
Adrian Nicolas Florea
Cong Han
N. Mesgarani
Mamba
97
14
0
13 Jul 2024
A Preliminary Investigation on Flexible Singing Voice Synthesis Through
  Decomposed Framework with Inferrable Features
A Preliminary Investigation on Flexible Singing Voice Synthesis Through Decomposed Framework with Inferrable Features
Lester Phillip Violeta
Taketo Akama
56
0
0
12 Jul 2024
3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental
  Health Detection
3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection
R. Cabral
Siwen Luo
Josiah Poon
S. Han
46
0
0
12 Jul 2024
Audio Spotforming Using Nonnegative Tensor Factorization with
  Attractor-Based Regularization
Audio Spotforming Using Nonnegative Tensor Factorization with Attractor-Based Regularization
Shoma Ayano
Li Li
Shogo Seki
Daichi Kitamura
18
0
0
12 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen M. Meng
Furu Wei
150
43
0
11 Jul 2024
A Benchmark for Multi-speaker Anonymization
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
97
1
0
08 Jul 2024
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for
  Text-to-Speech Speaker Adaptation
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Ruibo Fu
Xin Qi
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Xiaopeng Wang
Shuchen Shi
Yukun Liu
Xuefei Liu
Shuai Zhang
92
0
0
07 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for
  Large-Scale Speech Generation
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
91
54
0
07 Jul 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for
  Natural Interaction Between Humans and LLMs
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
139
57
0
04 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
97
2
0
04 Jul 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference
  Optimization
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
82
4
0
02 Jul 2024
Factor-Conditioned Speaking-Style Captioning
Factor-Conditioned Speaking-Style Captioning
Atsushi Ando
Takafumi Moriya
Shota Horiguchi
Ryo Masumura
73
0
0
27 Jun 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text
  Alignment
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
He Huang
Boris Ginsburg
Yu-Chiang Frank Wang
Hung-yi Lee
VLMAuLLM
96
19
0
27 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
76
14
0
25 Jun 2024
Sound Tagging in Infant-centric Home Soundscapes
Sound Tagging in Infant-centric Home Soundscapes
Mohammad Nur Hossain Khan
Jialu Li
Nancy L. McElwain
M. Hasegawa-Johnson
Bashima Islam
45
0
0
25 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot
  Speaker Adaptive Text-to-Speech
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
104
10
0
21 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
110
8
0
18 Jun 2024
Universal Score-based Speech Enhancement with High Content Preservation
Universal Score-based Speech Enhancement with High Content Preservation
Robin Scheibler
Yusuke Fujita
Yuma Shirahata
Tatsuya Komatsu
DiffM
100
15
0
18 Jun 2024
1000 African Voices: Advancing inclusive multi-speaker multi-accent
  speech synthesis
1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis
Sewade Ogun
A. Owodunni
Tobi Olatunji
Eniola Alese
Babatunde Oladimeji
Tejumade Afonja
Kayode Olaleye
Naome A. Etori
Tosin Adewumi
78
6
0
17 Jun 2024
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Zehua Kcriss Li
Meiying Melissa Chen
Yi Zhong
Pinxin Liu
Zhiyao Duan
39
0
0
15 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with
  Progressive Constraints in a Dual-mode Training Strategy
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
Linhan Ma
Xinfa Zhu
Yuanjun Lv
Zhichao Wang
Ziqian Wang
Wendi He
Hongbin Zhou
Lei Xie
67
3
0
14 Jun 2024
End-to-end Streaming model for Low-Latency Speech Anonymization
End-to-end Streaming model for Low-Latency Speech Anonymization
Waris Quamer
Ricardo Gutierrez-Osuna
96
0
0
13 Jun 2024
On Improving Error Resilience of Neural End-to-End Speech Coders
On Improving Error Resilience of Neural End-to-End Speech Coders
Kishan Gupta
N. Pia
Srikanth Korse
Andreas Brendel
Guillaume Fuchs
M. Multrus
75
0
0
13 Jun 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal
  Dysarthric Speech Reconstruction
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Xueyuan Chen
Dongchao Yang
Dingdong Wang
Xixin Wu
Zhiyong Wu
Helen Meng
68
2
0
12 Jun 2024
Asynchronous Voice Anonymization Using Adversarial Perturbation On
  Speaker Embedding
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
Rui Wang
Liping Chen
Kong Aik Lee
Zhen-Hua Ling
65
3
0
12 Jun 2024
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Yi Lu
Yuankun Xie
Ruibo Fu
Zhengqi Wen
Jianhua Tao
...
Xuefei Liu
Yongwei Li
Yukun Liu
Xiaopeng Wang
Shuchen Shi
71
1
0
12 Jun 2024
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts
  for Text-to-Speech and Style Captioning
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Masaya Kawamura
Ryuichi Yamamoto
Yuma Shirahata
Takuya Hasumi
Kentaro Tachibana
VLM
73
12
0
12 Jun 2024
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
Yuki Saito
Takuto Igarashi
Kentaro Seki
Shinnosuke Takamichi
Ryuichi Yamamoto
Kentaro Tachibana
Hiroshi Saruwatari
41
0
0
11 Jun 2024
Previous
12345...111213
Next