v1v2v3 (latest)

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization

3 April 2025

ArXiv (abs)PDF HTML Github (1541★)

Papers citing "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

31 / 31 papers shown

YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

04 Dec 2025

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

233

04 Dec 2025

Step-Audio-EditX Technical Report

...

214

05 Nov 2025

Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator

177

23 Oct 2025

No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS

144

23 Sep 2025

Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

168

28 Aug 2025

Multi-Metric Preference Alignment for Generative Speech Restoration

399

24 Aug 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

...

406

23 May 2025

Flow-GRPO: Training Flow Matching Models via Online RL

1.0K

319

08 May 2025

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

Shehzeen Samarah Hussain

491

07 Feb 2025

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

...

OffRL AI4TS LRM ReLM VLM

1.8K

5,342

22 Jan 2025

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

818

366

09 Oct 2024

Preference Alignment Improves Language Model-Based TTSIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Jinchuan Tian

Chunlei Zhang

Jiatong Shi

Hao Zhang

Jianwei Yu

Shinji Watanabe

Dong Yu

272

19 Sep 2024

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference OptimizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Xiaoxue Gao

Chen Zhang

Yiming Chen

Huayun Zhang

Nancy F. Chen

293

16 Sep 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Qian Chen

...

Qinglin Zhang

Shiliang Zhang

Nan Zhao

Siqi Zheng

AuLLM

481

140

04 Jul 2024

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Sefik Emre Eskimez

Xiaofei Wang

Manthan Thakker

Canrun Li

Chung-Hsien Tsai

...

Min Tang

Xu Tan

Yanqing Liu

Sheng Zhao

Naoyuki Kanda

VLM

341

176

26 Jun 2024

Nemotron-4 340B Technical Report

Nvidia

Bo Adler

Niket Agarwal

Ashwath Aithal

...

Jimmy Zhang

Jing Zhang

Vivienne Zhang

Yian Zhang

Chen Zhu

339

122

17 Jun 2024

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Xu Tan

Jinyu Li

Sheng Zhao

Yao Qian

Furu Wei

VLM

351

175

08 Jun 2024

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Philip Anastassiou

Jiawei Chen

Jingshu Chen

Yuanzhe Chen

Zhuo Chen

...

407

316

04 Jun 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

...

575

1,094

07 May 2024

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

...

Soledad López Gambino

478

116

12 Feb 2024

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao

Peiyi Wang

Runxin Xu

...

2.1K

5,487

05 Feb 2024

Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023

...

Yossi Adi

386

478

23 Jun 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023

Christopher D. Manning

Chelsea Finn

ALM

1.1K

8,135

29 May 2023

Training Diffusion Models with Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2023

757

778

22 May 2023

FunASR: A Fundamental End-to-End Speech Recognition ToolkitInterspeech (Interspeech), 2023

...

320

129

18 May 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

...

455

253

07 Mar 2023

Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023

...

616

1,138

05 Jan 2023

Wespeaker: A Research and Production oriented Speaker Embedding Learning ToolkitIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Shuai Wang

Binbin Zhang

359

217

31 Oct 2022

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker VerificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

353

189

12 Oct 2021

Proximal Policy Optimization Algorithms

1.5K

26,647

20 Jul 2017