Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2504.02407
Cited By
v1
v2
v3 (latest)
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
3 April 2025
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1541★)
Papers citing
"F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
31 / 31 papers shown
YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases
Gongyu Chen
Xiaoyu Zhang
Zhenqiang Weng
Junjie Zheng
Da Shen
Chaofan Ding
Wei-Qiang Zhang
Zihao Chen
97
3
0
04 Dec 2025
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
Junjie Zheng
Chunbo Hao
Guobin Ma
Xiaoyu Zhang
Gongyu Chen
Chaofan Ding
Zihao Chen
Lei Xie
DiffM
233
4
0
04 Dec 2025
Step-Audio-EditX Technical Report
Chao Yan
Boyong Wu
Peng Yang
Pengfei Tan
Guoqiang Hu
...
Xiangyu Zhang
Daxin Jiang
Daxin Jiang
Shuchang Zhou
Gang Yu
214
3
0
05 Nov 2025
Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator
H. Wang
Na Li
Chuke Wang
Shu Wu
Zhifeng Li
Dong Yu
DiffM
177
0
0
23 Oct 2025
No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
Seungyoun Shin
Dongha Ahn
Jiwoo Kim
Sungwook Jeon
144
0
0
23 Sep 2025
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
Luozhijie Jin
Zijie Qiu
J. Liu
Zijie Diao
Lifeng Qiao
Ning Ding
Alex Lamb
Xipeng Qiu
AI4CE
168
4
0
28 Aug 2025
Multi-Metric Preference Alignment for Generative Speech Restoration
Junan Zhang
Xueyao Zhang
Jing Yang
Yuancheng Wang
Fan Fan
Zhizheng Wu
399
6
0
24 Aug 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
406
95
0
23 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Yongqian Li
Jiaheng Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
1.0K
319
0
08 May 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
491
22
0
07 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRL
AI4TS
LRM
ReLM
VLM
1.8K
5,342
0
22 Jan 2025
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
818
366
0
09 Oct 2024
Preference Alignment Improves Language Model-Based TTS
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jinchuan Tian
Chunlei Zhang
Jiatong Shi
Hao Zhang
Jianwei Yu
Shinji Watanabe
Dong Yu
272
25
0
19 Sep 2024
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiaoxue Gao
Chen Zhang
Yiming Chen
Huayun Zhang
Nancy F. Chen
293
35
0
16 Sep 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
481
140
0
04 Jul 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
341
176
0
26 Jun 2024
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
339
122
0
17 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
351
175
0
08 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
Jingshu Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Yanzhe Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
407
316
0
04 Jun 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek-AI
Aixin Liu
Bei Feng
Bin Wang
Bingxuan Wang
...
Zhuoshu Li
Zihan Wang
Zihui Gu
Zilin Li
Ziwei Xie
MoE
575
1,094
0
07 May 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
478
116
0
12 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
2.1K
5,487
0
05 Feb 2024
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Neural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
386
478
0
23 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
1.1K
8,135
0
29 May 2023
Training Diffusion Models with Reinforcement Learning
International Conference on Learning Representations (ICLR), 2023
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
757
778
0
22 May 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Interspeech (Interspeech), 2023
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
320
129
0
18 May 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
455
253
0
07 Mar 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
616
1,138
0
05 Jan 2023
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Hongji Wang
Che-Yuan Liang
Shuai Wang
Zhengyang Chen
Binbin Zhang
Xu Xiang
Yan Deng
Y. Qian
359
217
0
31 Oct 2022
Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Zhengyang Chen
Sanyuan Chen
Yu-Huan Wu
Yao Qian
Chengyi Wang
Shujie Liu
Y. Qian
Michael Zeng
SSL
353
189
0
12 Oct 2021
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
1.5K
26,647
0
20 Jul 2017
1
Page 1 of 1