RLVF: Learning from Verbal Feedback without Overgeneralization

RLVF: Learning from Verbal Feedback without Overgeneralization

16 February 2024

Alexander Khazatsky

Papers citing "RLVF: Learning from Verbal Feedback without Overgeneralization"

11 / 11 papers shown

Title
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization Emiliano Penaloza Tianyue H. Zhan Laurent Charlin Mateo Espinosa Zarlenga 40 0 0 25 Apr 2025
Alignment of Diffusion Models: Fundamentals, Challenges, and Future Buhua Liu Shitong Shao Bao Li Lichen Bai Zhiqiang Xu Haoyi Xiong James Kwok Sumi Helal Zeke Xie 37 11 0 11 Sep 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt Deng Cai Huayang Li Tingchen Fu Siheng Li Weiwen Xu ... Leyang Cui Yan Wang Lemao Liu Taro Watanabe Shuming Shi KELM 26 2 0 24 Jun 2024
Suppressing Pink Elephants with Direct Principle Feedback Louis Castricato Nathan Lile Suraj Anand Hailey Schoelkopf Siddharth Verma Stella Biderman 58 9 0 12 Feb 2024
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models Beatriz Borges Niket Tandon Tanja Kaser Antoine Bosselut 19 3 0 01 Jul 2023
Learning by Distilling Context Charles Burton Snell Dan Klein Ruiqi Zhong ReLM LRM 161 44 0 30 Sep 2022
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 225 500 0 28 Sep 2022
Fast Model Editing at Scale E. Mitchell Charles Lin Antoine Bosselut Chelsea Finn Christopher D. Manning KELM 219 341 0 21 Oct 2021
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 275 1,587 0 18 Sep 2019
Dialogue Learning With Human-In-The-Loop Jiwei Li Alexander H. Miller S. Chopra MarcÁurelio Ranzato Jason Weston OffRL 216 134 0 29 Nov 2016
Deep Reinforcement Learning for Dialogue Generation Jiwei Li Will Monroe Alan Ritter Michel Galley Jianfeng Gao Dan Jurafsky 198 1,327 0 05 Jun 2016