Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.04642
Cited By
Teaching Large Language Models to Reason with Reinforcement Learning
7 March 2024
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Teaching Large Language Models to Reason with Reinforcement Learning"
14 / 14 papers shown
Title
Multi-agent Embodied AI: Advances and Future Directions
Zhaohan Feng
Ruiqi Xue
Lei Yuan
Yang Yu
Ning Ding
M. Liu
Bingzhao Gao
Jian-jun Sun
Gang Wang
AI4CE
40
0
0
08 May 2025
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
74
4
0
23 Oct 2024
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
38
1
0
11 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao
Hanze Dong
Amrita Saha
Caiming Xiong
Doyen Sahoo
LRM
27
3
0
10 Oct 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
Xuanchang Zhang
Wei Xiong
Lichang Chen
Tianyi Zhou
Heng Huang
Tong Zhang
ALM
22
10
0
18 Sep 2024
Large Language Models Assume People are More Rational than We Really are
Ryan Liu
Jiayi Geng
Joshua C. Peterson
Ilia Sucholutsky
Thomas L. Griffiths
49
16
0
24 Jun 2024
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
N. Sebe
Mubarak Shah
EGVM
54
5
0
22 May 2024
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
95
121
0
10 Oct 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
220
495
0
28 Sep 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
S. Hoi
SyDa
ALM
116
232
0
05 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning
Xiangyu Peng
Siyan Li
Sarah Wiegreffe
Mark O. Riedl
LRM
38
38
0
04 May 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1