Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.19188
Cited By
Averaging log-likelihoods in direct alignment
27 June 2024
Nathan Grinsztajn
Yannis Flet-Berliac
M. G. Azar
Florian Strub
Bill Wu
Eugene Choi
Chris Cremer
Arash Ahmadian
Yash Chandak
Olivier Pietquin
Matthieu Geist
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Averaging log-likelihoods in direct alignment"
2 / 2 papers shown
Title
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
89
2
0
18 Mar 2025
Self-Improving Robust Preference Optimization
Eugene Choi
Arash Ahmadian
Matthieu Geist
Oilvier Pietquin
M. G. Azar
28
8
0
03 Jun 2024
1