Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18802
Cited By
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
30 May 2023
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus"
48 / 48 papers shown
Title
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
100
1
0
07 May 2025
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
Heitor R. Guimarães
Jiaqi Su
Rithesh Kumar
Tiago H. Falk
Zeyu Jin
DiffM
30
2
0
13 Apr 2025
Scaling Rich Style-Prompted Text-to-Speech Datasets
Anuj Diwan
Zhisheng Zheng
David F. Harwath
Eunsol Choi
CLIP
VLM
75
0
0
06 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
X. Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
S. Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
73
10
0
03 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
70
0
0
01 Mar 2025
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Weihao Wu
Zhiwei Lin
Yixuan Zhou
Jingbei Li
Rui Niu
Qinghua Wu
Songjun Cao
Long Ma
Zhiyong Wu
DiffM
39
0
0
27 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
91
0
0
21 Feb 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
Chun-Yi Kuan
Hung-yi Lee
63
0
0
08 Feb 2025
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
Shuqi Dai
Yunyun Wang
Roger B. Dannenberg
Zeyu Jin
DiffM
53
0
0
23 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
56
0
0
15 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
34
0
0
10 Jan 2025
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Jinzuomu Zhong
Korin Richmond
Zhiba Su
Siqi Sun
53
4
0
10 Jan 2025
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
34
0
0
31 Dec 2024
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
31
0
0
31 Dec 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
32
0
0
09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality
Nicholas Babaev
Kirill Tamogashev
Azat Saginbaev
Ivan Shchekotov
Hanbin Bae
Hosang Sung
WonJun Lee
Hoon-Young Cho
Pavel Andreev
29
2
0
08 Oct 2024
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
Tuan Nam Nguyen
Ngoc-Quan Pham
A. Waibel
28
1
0
30 Sep 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
32
2
0
26 Sep 2024
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
Lauri Juvela
Xin Eric Wang
21
2
0
20 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
27
1
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
64
0
0
14 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
33
2
0
30 Aug 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
80
0
0
30 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
27
4
0
12 Aug 2024
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Masato Mimura
Takatomo Kano
A. Ogawa
Marc Delcroix
19
2
0
01 Aug 2024
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
26
0
0
17 Jul 2024
Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models
Borodin Kirill Nikolayevich
Kudryavtsev Vasiliy Dmitrievich
Mkrtchian Grach Maratovich
Gorodnichev Mikhail Genadievich
Korzh Dmitrii Sergeevich
26
0
0
27 Jun 2024
Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Hokuto Munakata
Ryo Terashima
Yusuke Fujita
23
0
0
24 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
34
5
0
21 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
32
1
0
18 Jun 2024
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Masaya Kawamura
Ryuichi Yamamoto
Yuma Shirahata
Takuya Hasumi
Kentaro Tachibana
VLM
22
5
0
12 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
32
66
0
07 Jun 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
32
1
0
21 May 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
38
16
0
23 Apr 2024
Voice Attribute Editing with Text Prompt
Zheng-Yan Sheng
Yang Ai
Li-Juan Liu
Jia Pan
Zhenhua Ling
26
6
0
13 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
65
39
0
03 Apr 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
18
2
0
08 Mar 2024
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Wei-wei Lin
Chenhang He
Man-Wai Mak
Jiachen Lian
Kong Aik Lee
DiffM
30
0
0
01 Mar 2024
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
16
35
0
02 Feb 2024
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
Dong Yang
Tomoki Koriyama
Yuki Saito
11
1
0
01 Feb 2024
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen
Jiakun Pei
Liumeng Xue
Mingyang Zhang
20
4
0
07 Jan 2024
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
13
12
0
25 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
16
19
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
21
3
0
15 Sep 2023
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
42
29
0
03 Oct 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
171
377
0
04 Dec 2021
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
James Qin
Daniel S. Park
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Quoc V. Le
Yonghui Wu
VLM
SSL
136
307
0
20 Oct 2020
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Z. Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
201
819
0
12 Jun 2018
1