v1v2 (latest)

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

7 February 2025

Shehzeen Samarah Hussain

Papers citing "Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance"

49 / 49 papers shown

The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech

Julio Cesar Galdino

S. Leal

Leticia Gabriella De Souza

Rodrigo Lima

Antonio Nelson Fornari Mendes Moreira

Arnaldo Cândido Júnior

Miguel Oliveira Jr.

Edresson Casanova

S. Aluísio

06 Nov 2025

Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator

141

23 Oct 2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

...

180

17 Oct 2025

Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models

15 Oct 2025

Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization

Shehzeen Samarah Hussain

100

26 Sep 2025

Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation

Ryan Langman Jaehyeon Kim

Subhankar Ghosh

Shehzeen Samarah Hussain

Jason Chun Lok Li

OffRL

130

23 Sep 2025

Multi-Metric Preference Alignment for Generative Speech Restoration

196

24 Aug 2025

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation

236

26 May 2025

SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model

606

21 May 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

677

07 May 2025

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization

476

03 Apr 2025

Classifier-free guidance in LLMs Safety

Roman Smirnov

166

08 Dec 2024

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

609

269

09 Oct 2024

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and InferenceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Edresson Casanova

Ryan Langman

Paarth Neekhara

Shehzeen Samarah Hussain

Subhankar Ghosh

168

18 Sep 2024

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Sefik Emre Eskimez

Xiaofei Wang

Manthan Thakker

Canrun Li

Chung-Hsien Tsai

...

Min Tang

Xu Tan

Yanqing Liu

Sheng Zhao

Naoyuki Kanda

VLM

275

143

26 Jun 2024

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Paarth Neekhara

Shehzeen Samarah Hussain

Subhankar Ghosh

196

25 Jun 2024

Nemotron-4 340B Technical Report

Nvidia

Bo Adler

Niket Agarwal

Ashwath Aithal

...

Jimmy Zhang

Jing Zhang

Vivienne Zhang

Yian Zhang

Chen Zhu

301

107

17 Jun 2024

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelInterspeech (Interspeech), 2024

...

273

201

07 Jun 2024

SpeechAlign: Aligning Speech Generation to Human Preferences

Xipeng Qiu

227

08 Apr 2024

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao

Peiyi Wang

Runxin Xu

...

1.5K

3,768

05 Feb 2024

Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?

Marcio Fonseca

Shay B. Cohen

297

18 Jan 2024

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language ModelsInternational Conference on Machine Learning (ICML), 2024

Quanquan Gu

559

445

02 Jan 2024

A General Theoretical Paradigm to Understand Learning from Human PreferencesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

Bilal Piot

Daniele Calandriello

609

843

18 Oct 2023

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

Paarth Neekhara

Shehzeen Samarah Hussain

Boris Ginsburg

Shlomo Dubnov

180

14 Oct 2023

Finite Scalar Quantization: VQ-VAE Made SimpleInternational Conference on Learning Representations (ICLR), 2023

343

348

27 Sep 2023

SpeechX: Neural Codec Language Model as a Versatile Speech TransformerIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

296

112

14 Aug 2023

Stay on topic with Classifier-Free Guidance

Pawan Sasanka Ammanamanchi

Stella Biderman

3DV

238

30 Jun 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023

...

Yossi Adi

297

428

23 Jun 2023

CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages

F. S. Oliveira

Edresson Casanova

Arnaldo Cândido Júnior

A. S. Soares

A. R. G. Filho

162

16 Jun 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Cong Han

303

212

13 Jun 2023

High-Fidelity Audio Compression with Improved RVQGANNeural Information Processing Systems (NeurIPS), 2023

292

561

11 Jun 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023

Christopher D. Manning

Chelsea Finn

ALM

871

6,697

29 May 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and DurationsInternational Conference on Machine Learning (ICML), 2023

Hainan Xu

Boris Ginsburg

180

13 Apr 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

...

330

239

07 Mar 2023

LLaMA: Open and Efficient Foundation Language Models

...

5.9K

17,759

27 Feb 2023

ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shehzeen Samarah Hussain

Paarth Neekhara

Jocelyn Huang

Jason Chun Lok Li

Boris Ginsburg

140

16 Feb 2023

Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023

...

382

1,011

05 Jan 2023

Robust Speech Recognition via Large-Scale Weak SupervisionInternational Conference on Machine Learning (ICML), 2022

1.0K

5,722

06 Dec 2022

High Fidelity Neural Audio Compression

Alexandre Défossez

Jade Copet

Gabriel Synnaeve

Yossi Adi

309

988

24 Oct 2022

AudioLM: a Language Modeling Approach to Audio GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Olivier Pietquin

...

394

813

07 Sep 2022

Classifier-Free Diffusion Guidance

Jonathan Ho

Tim Salimans

FaML

475

5,304

26 Jul 2022

Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022

Carroll L. Wainwright

...

2.1K

17,490

04 Mar 2022

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021

Edresson Casanova

Julian Weber

C. Shulby

Arnaldo Cândido Júnior

Eren Golge

M. Ponti

673

547

04 Dec 2021

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global contextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Nithin Rao Koluguri

Taejin Park

Boris Ginsburg

ViT

200

146

08 Oct 2021

One TTS Alignment To Rule Them AllIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

199

102

23 Aug 2021

SoundStream: An End-to-End Neural Audio Codec

504

1,103

07 Jul 2021

MLS: A Large-Scale Multilingual Dataset for Speech ResearchInterspeech (Interspeech), 2020

580

670

07 Dec 2020

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

322

1,187

05 Apr 2019

Deep reinforcement learning from human preferencesNeural Information Processing Systems (NeurIPS), 2017

1.6K

4,387

12 Jun 2017