ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.19188
  4. Cited By
Averaging log-likelihoods in direct alignment

Averaging log-likelihoods in direct alignment

27 June 2024
Nathan Grinsztajn
Yannis Flet-Berliac
M. G. Azar
Florian Strub
Bill Wu
Eugene Choi
Chris Cremer
Arash Ahmadian
Yash Chandak
Olivier Pietquin
Matthieu Geist
    MoMe
ArXivPDFHTML

Papers citing "Averaging log-likelihoods in direct alignment"

2 / 2 papers shown
Title
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
89
2
0
18 Mar 2025
Self-Improving Robust Preference Optimization
Self-Improving Robust Preference Optimization
Eugene Choi
Arash Ahmadian
Matthieu Geist
Oilvier Pietquin
M. G. Azar
28
8
0
03 Jun 2024
1