Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2502.05236
Cited By
v1
v2 (latest)
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
7 February 2025
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance"
49 / 49 papers shown
Title
The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech
Julio Cesar Galdino
S. Leal
Leticia Gabriella De Souza
Rodrigo Lima
Antonio Nelson Fornari Mendes Moreira
Arnaldo Cândido Júnior
Miguel Oliveira Jr.
Edresson Casanova
S. Aluísio
32
0
0
06 Nov 2025
Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator
H. Wang
Na Li
Chuke Wang
Shu Wu
Zhifeng Li
Dong Yu
DiffM
100
0
0
23 Oct 2025
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye
Chao-Han Huck Yang
Arushi Goel
Wei Huang
Ligeng Zhu
...
Andrew Tao
Song Han
Jan Kautz
Hongxu Yin
Pavlo Molchanov
142
3
0
17 Oct 2025
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models
Yizhou Peng
Yukun Ma
C. Zhang
Yi-Wen Chao
Chongjia Ni
B. Ma
57
0
0
15 Oct 2025
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Roy Fejgin
Ryan Langman
Mikyas T. Desta
Leili Tavabi
Jason Chun Lok Li
76
0
0
26 Sep 2025
Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation
Roy Fejgin
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Ryan Langman Jaehyeon Kim
Subhankar Ghosh
Shehzeen Samarah Hussain
Jason Chun Lok Li
OffRL
88
0
0
23 Sep 2025
Multi-Metric Preference Alignment for Generative Speech Restoration
Junan Zhang
Xueyao Zhang
Jing Yang
Yuancheng Wang
Fan Fan
Zhizheng Wu
128
4
0
24 Aug 2025
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Puyuan Peng
Shang-Wen Li
Abdelrahman Mohamed
David Harwath
175
0
0
26 May 2025
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Ke Hu
Ehsan Hosseini-Asl
Chen Chen
Edresson Casanova
Subhankar Ghosh
Piotr .Zelasko
Zhiwen Chen
Jia-Nan Li
Jagadeesh Balam
Boris Ginsburg
AuLLM
511
0
0
21 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xueyao Zhang
Yijiao Wang
Chaoren Wang
Hui Yuan
Zhuo Chen
Zhizheng Wu
616
10
0
07 May 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
384
13
0
03 Apr 2025
Classifier-free guidance in LLMs Safety
Roman Smirnov
MU
146
1
0
08 Dec 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
522
244
0
09 Oct 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
155
13
0
18 Sep 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
215
137
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
172
26
0
25 Jun 2024
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
253
105
0
17 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Interspeech (Interspeech), 2024
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
188
192
0
07 Jun 2024
SpeechAlign: Aligning Speech Generation to Human Preferences
Dong Zhang
Zhaowei Li
Shimin Li
Xin Zhang
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
ALM
AuLLM
175
35
0
08 Apr 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
1.1K
3,549
0
05 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
Marcio Fonseca
Shay B. Cohen
237
13
0
18 Jan 2024
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
International Conference on Machine Learning (ICML), 2024
Zixiang Chen
Yihe Deng
Huizhuo Yuan
Kaixuan Ji
Quanquan Gu
SyDa
482
414
0
02 Jan 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
501
816
0
18 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
150
7
0
14 Oct 2023
Finite Scalar Quantization: VQ-VAE Made Simple
International Conference on Learning Representations (ICLR), 2023
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
274
330
0
27 Sep 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
247
109
0
14 Aug 2023
Stay on topic with Classifier-Free Guidance
Guillaume Sanchez
Honglu Fan
Alexander Spangher
Elad Levi
Pawan Sasanka Ammanamanchi
Stella Biderman
3DV
187
67
0
30 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Neural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
245
414
0
23 Jun 2023
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
F. S. Oliveira
Edresson Casanova
Arnaldo Cândido Júnior
A. S. Soares
A. R. G. Filho
158
13
0
16 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
250
205
0
13 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Neural Information Processing Systems (NeurIPS), 2023
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
255
543
0
11 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
755
6,364
0
29 May 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
International Conference on Machine Learning (ICML), 2023
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
148
40
0
13 Apr 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
258
233
0
07 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
2.6K
17,255
0
27 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
132
33
0
16 Feb 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
362
985
0
05 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
880
5,505
0
06 Dec 2022
High Fidelity Neural Audio Compression
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
228
955
0
24 Oct 2022
AudioLM: a Language Modeling Approach to Audio Generation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
344
793
0
07 Sep 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
422
5,152
0
26 Jul 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
1.9K
16,931
0
04 Mar 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
International Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
569
536
0
04 Dec 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Nithin Rao Koluguri
Taejin Park
Boris Ginsburg
ViT
196
144
0
08 Oct 2021
One TTS Alignment To Rule Them All
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
163
101
0
23 Aug 2021
SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour
Alejandro Luebs
Ahmed Omran
Jan Skoglund
Marco Tagliasacchi
AI4TS
275
1,076
0
07 Jul 2021
MLS: A Large-Scale Multilingual Dataset for Speech Research
Interspeech (Interspeech), 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
530
655
0
07 Dec 2020
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
295
1,173
0
05 Apr 2019
Deep reinforcement learning from human preferences
Neural Information Processing Systems (NeurIPS), 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
1.2K
4,264
0
12 Jun 2017
1