Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.02882
Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"
50 / 617 papers shown
Title
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
26
0
0
18 Jun 2025
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
Jiayi He
Jiangyan Yi
J. Tao
Siding Zeng
Hao Gu
11
0
0
17 Jun 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild
Taewoo Kim
Uijong Lee
H. Park
Choongsang Cho
Nam In Park
Young Han Lee
7
0
0
16 Jun 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
20
0
0
16 Jun 2025
Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Xiaoran Fan
Zhichao Sun
Yangfan Gao
Jingfei Xiong
Hang Yan
...
Shaokang Dong
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
15
0
0
14 Jun 2025
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
Yisi Liu
Chenyang Wang
Hanjo Kim
Raniya Khan
Gopala Anumanchipalli
90
0
0
12 Jun 2025
A Review on Score-based Generative Models for Audio Applications
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffM
MedIm
24
0
0
10 Jun 2025
Spectral Domain Neural Reconstruction for Passband FMCW Radars
Harshvardhan Takawale
Nirupam Roy
13
0
0
09 Jun 2025
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Seymanur Akti
T. Nguyen
Alexander Waibel
DRL
137
0
0
04 Jun 2025
Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning
Sarenne Wallbridge
Christoph Minixhofer
Catherine Lai
P. Bell
SSL
50
0
0
03 Jun 2025
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
Fengjin Li
Jie Wang
Yadong Niu
Yongqing Wang
Meng Meng
Jian Luan
Zhiyong Wu
49
0
0
03 Jun 2025
Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
Mattson Ogg
Caitlyn Bishop
Han Yi
Sarah Robinson
60
0
0
02 Jun 2025
DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation
Ming Meng
Ziyi Yang
Jian Yang
Zhenjie Su
Yonggui Zhu
Zhaoxin Fan
DiffM
VLM
35
0
0
01 Jun 2025
ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
Pengyu Ren
Wenhao Guan
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
21
0
0
01 Jun 2025
PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data
Songjun Cao
Qinghua Wu
Jie Chen
Jin Li
Long Ma
35
0
0
01 Jun 2025
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Leying Zhang
Y. Qian
Xiaofei Wang
Manthan Thakker
Dongmei Wang
...
Haibin Wu
Yuxuan Hu
Jinyu Li
Yanmin Qian
Sheng Zhao
27
0
0
01 Jun 2025
DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model
Xueyuan Chen
Dongchao Yang
Wenxuan Wu
Minglin Wu
Jing Xu
Xixin Wu
Zhiyong Wu
Helen M. Meng
DiffM
24
0
0
31 May 2025
ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation
Jiatong Shi
Yifan Cheng
Bo-Hao Su
Hye-jin Shim
Jinchuan Tian
Samuele Cornell
Yiwen Zhao
Siddhant Arora
Shinji Watanabe
43
0
0
30 May 2025
Probing the Robustness Properties of Neural Speech Codecs
Wei-Cheng Tseng
David Harwath
35
0
0
30 May 2025
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
Jin Wang
Wenbin Jiang
Xiangbo Wang
23
0
0
30 May 2025
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
Jinming Zhang
Xuanru Zhou
Jiachen Lian
Shuhe Li
William Li
...
Zachary Miller
Jet M J Vonk
Brittany Morin
M. G. Tempini
Gopala Anumanchipalli
51
1
0
28 May 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
Charlotte Pouw
Afra Alishahi
Willem H. Zuidema
28
0
0
28 May 2025
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
Haoqin Sun
Jingguang Tian
Jiaming Zhou
Hui Wang
Jiabei He
...
Xiangyu Kong
Desheng Hu
Xinkang Xu
Xinhui Hu
Yong Qin
33
0
0
26 May 2025
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Haonan Zhang
Run Luo
Xiong Liu
Yuchuan Wu
Ting-En Lin
...
Min Yang
Lianli Gao
Jingkuan Song
Fei Huang
Yongbin Li
AI4CE
76
0
0
26 May 2025
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling
Haiyang Sun
Shujie Hu
Shujie Liu
L. Meng
Hui Wang
...
Yifan Yang
Yanqing Liu
Sheng Zhao
Yan Lu
Y. Qian
62
1
0
26 May 2025
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
Zhichao Wu
Yueteng Kang
Songjun Cao
Long Ma
Qiulin Li
Qun Yang
DiffM
52
0
0
24 May 2025
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
SyDa
84
1
0
23 May 2025
Private kNN-VC: Interpretable Anonymization of Converted Speech
Carlos Franzreb
Arnab Das
Tim Polzehl
Sebastian Möller
23
0
0
23 May 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
Tianduo Wang
Lu Xu
Wei Lu
Shanbo Cheng
43
0
0
22 May 2025
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
Advait Joglekar
Divyanshu Singh
Rooshil Rohit Bhatia
S. Umesh
86
0
0
22 May 2025
Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate
Hanglei Zhang
Yiwei Guo
Zhihan Li
Xiang Hao
Xie Chen
Kai Yu
41
0
0
22 May 2025
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
Junchuan Zhao
Xintong Wang
Ye Wang
26
0
0
21 May 2025
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
Zijian Lin
Yang Zhang
Yougen Yuan
Yuming Yan
Jinjiang Liu
Zhiyong Wu
Pengfei Hu
Qun Yu
82
0
0
21 May 2025
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
Jixun Yao
Hexin Liu
Eng Siong Chng
Lei Xie
28
0
0
21 May 2025
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
63
0
0
19 May 2025
Inference Attacks for X-Vector Speaker Anonymization
L. A. Bauer
Wenxuan Bao
Malvika Jadhav
Vincent Bindschaedler
71
0
0
13 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
79
0
0
12 May 2025
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
106
0
0
08 May 2025
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
Zhongweiyang Xu
Xulin Fan
Zhong-Qiu Wang
Xilin Jiang
Romit Roy Choudhury
DiffM
160
0
0
08 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
75
2
0
06 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
Anton Van Den Hengel
Yuankai Qi
Qingming Huang
432
0
0
02 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
112
0
0
01 May 2025
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Yue Li
Wen Liu
Dongdong Lin
81
0
0
29 Apr 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLM
VLM
181
13
0
25 Apr 2025
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington
Xuechen Liu
Junichi Yamagishi
108
0
0
22 Apr 2025
SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation
Yue Li
Weizhi Liu
Dongdong Lin
160
0
0
21 Apr 2025
Protecting Your Voice: Temporal-aware Robust Watermarking
Yue Li
Weizhi Liu
Dongdong Lin
Hui Tian
Hongxia Wang
107
0
0
21 Apr 2025
SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures
Kuang Yuan
Yifeng Wang
Xiyuxing Zhang
Chengyi Shen
Swarun Kumar
Justin Chan
48
1
0
15 Apr 2025
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation
Yan Rong
Shan Yang
Guangzhi Lei
Li Liu
89
2
0
15 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
Shixuan Liu
Jiajian Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
111
1
0
14 Apr 2025
1
2
3
4
...
11
12
13
Next