ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05236
  4. Cited By
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
v1v2 (latest)

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

7 February 2025
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
ArXiv (abs)PDFHTML

Papers citing "Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance"

49 / 49 papers shown
Title
The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech
The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech
Julio Cesar Galdino
S. Leal
Leticia Gabriella De Souza
Rodrigo Lima
Antonio Nelson Fornari Mendes Moreira
Arnaldo Cândido Júnior
Miguel Oliveira Jr.
Edresson Casanova
S. Aluísio
32
0
0
06 Nov 2025
Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator
Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator
H. Wang
Na Li
Chuke Wang
Shu Wu
Zhifeng Li
Dong Yu
DiffM
100
0
0
23 Oct 2025
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye
Chao-Han Huck Yang
Arushi Goel
Wei Huang
Ligeng Zhu
...
Andrew Tao
Song Han
Jan Kautz
Hongxu Yin
Pavlo Molchanov
142
3
0
17 Oct 2025
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models
Yizhou Peng
Yukun Ma
C. Zhang
Yi-Wen Chao
Chongjia Ni
B. Ma
57
0
0
15 Oct 2025
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Roy Fejgin
Ryan Langman
Mikyas T. Desta
Leili Tavabi
Jason Chun Lok Li
76
0
0
26 Sep 2025
Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation
Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation
Roy Fejgin
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Ryan Langman Jaehyeon Kim
Subhankar Ghosh
Shehzeen Samarah Hussain
Jason Chun Lok Li
OffRL
88
0
0
23 Sep 2025
Multi-Metric Preference Alignment for Generative Speech Restoration
Multi-Metric Preference Alignment for Generative Speech Restoration
Junan Zhang
Xueyao Zhang
Jing Yang
Yuancheng Wang
Fan Fan
Zhizheng Wu
128
4
0
24 Aug 2025
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Puyuan Peng
Shang-Wen Li
Abdelrahman Mohamed
David Harwath
175
0
0
26 May 2025
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Ke Hu
Ehsan Hosseini-Asl
Chen Chen
Edresson Casanova
Subhankar Ghosh
Piotr .Zelasko
Zhiwen Chen
Jia-Nan Li
Jagadeesh Balam
Boris Ginsburg
AuLLM
511
0
0
21 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xueyao Zhang
Yijiao Wang
Chaoren Wang
Hui Yuan
Zhuo Chen
Zhizheng Wu
616
10
0
07 May 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
384
13
0
03 Apr 2025
Classifier-free guidance in LLMs Safety
Classifier-free guidance in LLMs Safety
Roman Smirnov
MU
146
1
0
08 Dec 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
522
244
0
09 Oct 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality
  Speech LLM Training and Inference
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and InferenceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
155
13
0
18 Sep 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
215
137
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
172
26
0
25 Jun 2024
Nemotron-4 340B Technical Report
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
253
105
0
17 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelInterspeech (Interspeech), 2024
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
188
192
0
07 Jun 2024
SpeechAlign: Aligning Speech Generation to Human Preferences
SpeechAlign: Aligning Speech Generation to Human Preferences
Dong Zhang
Zhaowei Li
Shimin Li
Xin Zhang
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
ALMAuLLM
175
35
0
08 Apr 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
1.1K
3,549
0
05 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific
  Communication Goals?
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
Marcio Fonseca
Shay B. Cohen
237
13
0
18 Jan 2024
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
  Models
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language ModelsInternational Conference on Machine Learning (ICML), 2024
Zixiang Chen
Yihe Deng
Huizhuo Yuan
Kaixuan Ji
Quanquan Gu
SyDa
482
414
0
02 Jan 2024
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human PreferencesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
501
816
0
18 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
150
7
0
14 Oct 2023
Finite Scalar Quantization: VQ-VAE Made Simple
Finite Scalar Quantization: VQ-VAE Made SimpleInternational Conference on Learning Representations (ICLR), 2023
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
274
330
0
27 Sep 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech TransformerIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
247
109
0
14 Aug 2023
Stay on topic with Classifier-Free Guidance
Stay on topic with Classifier-Free Guidance
Guillaume Sanchez
Honglu Fan
Alexander Spangher
Elad Levi
Pawan Sasanka Ammanamanchi
Stella Biderman
3DV
187
67
0
30 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
245
414
0
23 Jun 2023
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource
  Languages
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
F. S. Oliveira
Edresson Casanova
Arnaldo Cândido Júnior
A. S. Soares
A. R. G. Filho
158
13
0
16 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLMDiffM
250
205
0
13 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
High-Fidelity Audio Compression with Improved RVQGANNeural Information Processing Systems (NeurIPS), 2023
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
255
543
0
11 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
755
6,364
0
29 May 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and DurationsInternational Conference on Machine Learning (ICML), 2023
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
148
40
0
13 Apr 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
258
233
0
07 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
2.6K
17,255
0
27 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly
  Disentangled Self-supervised Speech Representations
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
132
33
0
16 Feb 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
362
985
0
05 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak SupervisionInternational Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
880
5,505
0
06 Dec 2022
High Fidelity Neural Audio Compression
High Fidelity Neural Audio Compression
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
228
955
0
24 Oct 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
344
793
0
07 Sep 2022
Classifier-Free Diffusion Guidance
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
422
5,152
0
26 Jul 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
1.9K
16,931
0
04 Mar 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
569
536
0
04 Dec 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise
  separable convolutions and global context
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global contextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Nithin Rao Koluguri
Taejin Park
Boris Ginsburg
ViT
196
144
0
08 Oct 2021
One TTS Alignment To Rule Them All
One TTS Alignment To Rule Them AllIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
163
101
0
23 Aug 2021
SoundStream: An End-to-End Neural Audio Codec
SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour
Alejandro Luebs
Ahmed Omran
Jan Skoglund
Marco Tagliasacchi
AI4TS
275
1,076
0
07 Jul 2021
MLS: A Large-Scale Multilingual Dataset for Speech Research
MLS: A Large-Scale Multilingual Dataset for Speech ResearchInterspeech (Interspeech), 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
530
655
0
07 Dec 2020
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
295
1,173
0
05 Apr 2019
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferencesNeural Information Processing Systems (NeurIPS), 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
1.2K
4,264
0
12 Jun 2017
1