Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.07181
Cited By
Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization
14 January 2024
Houda Nait El Barj
Théophile Sautory
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization"
4 / 4 papers shown
Title
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ELM
ReLM
LRM
11
24
0
17 Feb 2024
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
Constructing Unrestricted Adversarial Examples with Generative Models
Yang Song
Rui Shu
Nate Kushman
Stefano Ermon
GAN
AAML
166
300
0
21 May 2018
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
196
199
0
02 May 2018
1