Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2502.05236
Cited By
v1
v2 (latest)
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
7 February 2025
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance"
49 / 49 papers shown
The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech
Julio Cesar Galdino
S. Leal
Leticia Gabriella De Souza
Rodrigo Lima
Antonio Nelson Fornari Mendes Moreira
Arnaldo Cândido Júnior
Miguel Oliveira Jr.
Edresson Casanova
S. Aluísio
65
0
0
06 Nov 2025
Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator
H. Wang
Na Li
Chuke Wang
Shu Wu
Zhifeng Li
Dong Yu
DiffM
141
0
0
23 Oct 2025
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye
Chao-Han Huck Yang
Arushi Goel
Wei Huang
Ligeng Zhu
...
Andrew Tao
Song Han
Jan Kautz
Hongxu Yin
Pavlo Molchanov
180
3
0
17 Oct 2025
Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models
Yizhou Peng
Yukun Ma
C. Zhang
Yi-Wen Chao
Chongjia Ni
B. Ma
99
0
0
15 Oct 2025
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Roy Fejgin
Ryan Langman
Mikyas T. Desta
Leili Tavabi
Jason Chun Lok Li
100
0
0
26 Sep 2025
Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation
Roy Fejgin
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Ryan Langman Jaehyeon Kim
Subhankar Ghosh
Shehzeen Samarah Hussain
Jason Chun Lok Li
OffRL
130
0
0
23 Sep 2025
Multi-Metric Preference Alignment for Generative Speech Restoration
Junan Zhang
Xueyao Zhang
Jing Yang
Yuancheng Wang
Fan Fan
Zhizheng Wu
196
5
0
24 Aug 2025
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Puyuan Peng
Shang-Wen Li
Abdelrahman Mohamed
David Harwath
236
0
0
26 May 2025
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Ke Hu
Ehsan Hosseini-Asl
Chen Chen
Edresson Casanova
Subhankar Ghosh
Piotr .Zelasko
Zhiwen Chen
Jia-Nan Li
Jagadeesh Balam
Boris Ginsburg
AuLLM
606
0
0
21 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xueyao Zhang
Yijiao Wang
Chaoren Wang
Hui Yuan
Zhuo Chen
Zhizheng Wu
677
11
0
07 May 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
476
13
0
03 Apr 2025
Classifier-free guidance in LLMs Safety
Roman Smirnov
MU
166
1
0
08 Dec 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
609
269
0
09 Oct 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
168
14
0
18 Sep 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
275
143
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
196
27
0
25 Jun 2024
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
301
107
0
17 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Interspeech (Interspeech), 2024
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
273
201
0
07 Jun 2024
SpeechAlign: Aligning Speech Generation to Human Preferences
Dong Zhang
Zhaowei Li
Shimin Li
Xin Zhang
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
ALM
AuLLM
227
37
0
08 Apr 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
1.5K
3,768
0
05 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
Marcio Fonseca
Shay B. Cohen
297
13
0
18 Jan 2024
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
International Conference on Machine Learning (ICML), 2024
Zixiang Chen
Yihe Deng
Huizhuo Yuan
Kaixuan Ji
Quanquan Gu
SyDa
559
445
0
02 Jan 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
609
843
0
18 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
180
7
0
14 Oct 2023
Finite Scalar Quantization: VQ-VAE Made Simple
International Conference on Learning Representations (ICLR), 2023
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
343
348
0
27 Sep 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
296
112
0
14 Aug 2023
Stay on topic with Classifier-Free Guidance
Guillaume Sanchez
Honglu Fan
Alexander Spangher
Elad Levi
Pawan Sasanka Ammanamanchi
Stella Biderman
3DV
238
69
0
30 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Neural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
297
428
0
23 Jun 2023
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
F. S. Oliveira
Edresson Casanova
Arnaldo Cândido Júnior
A. S. Soares
A. R. G. Filho
162
13
0
16 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
303
212
0
13 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Neural Information Processing Systems (NeurIPS), 2023
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
292
561
0
11 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
871
6,697
0
29 May 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
International Conference on Machine Learning (ICML), 2023
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
180
44
0
13 Apr 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
330
239
0
07 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
5.9K
17,759
0
27 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
140
34
0
16 Feb 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
382
1,011
0
05 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
1.0K
5,722
0
06 Dec 2022
High Fidelity Neural Audio Compression
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
309
988
0
24 Oct 2022
AudioLM: a Language Modeling Approach to Audio Generation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
394
813
0
07 Sep 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
475
5,304
0
26 Jul 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
2.1K
17,490
0
04 Mar 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
International Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
673
547
0
04 Dec 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Nithin Rao Koluguri
Taejin Park
Boris Ginsburg
ViT
200
146
0
08 Oct 2021
One TTS Alignment To Rule Them All
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
199
102
0
23 Aug 2021
SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour
Alejandro Luebs
Ahmed Omran
Jan Skoglund
Marco Tagliasacchi
AI4TS
504
1,103
0
07 Jul 2021
MLS: A Large-Scale Multilingual Dataset for Speech Research
Interspeech (Interspeech), 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
580
670
0
07 Dec 2020
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
322
1,187
0
05 Apr 2019
Deep reinforcement learning from human preferences
Neural Information Processing Systems (NeurIPS), 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
1.6K
4,387
0
12 Jun 2017
1