ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14718
  4. Cited By
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

24 May 2023
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
ArXivPDFHTML

Papers citing "Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models"

7 / 7 papers shown
Title
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Z. Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
35
1
0
20 Feb 2025
Self-playing Adversarial Language Game Enhances LLM Reasoning
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDa
LRM
ReLM
77
28
0
16 Apr 2024
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
23
17
0
18 Sep 2023
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
121
101
0
05 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Approaching Neural Grammatical Error Correction as a Low-Resource
  Machine Translation Task
Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task
Marcin Junczys-Dowmunt
Roman Grundkiewicz
Shubha Guha
Kenneth Heafield
33
190
0
16 Apr 2018
1