Principled Reinforcement Learning with Human Feedback from Pairwise or $K$ -wise Comparisons

26 January 2023

Papers citing "Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons"

36 / 36 papers shown

Title
Semantic Probabilistic Control of Language Models Kareem Ahmed Catarina G Belém Padhraic Smyth Sameer Singh 35 0 0 04 May 2025
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback Nan Lu Ethan X. Fang Junwei Lu 51 0 0 27 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback Muhammad Qasim Elahi Somtochukwu Oguchienti Maheed H. Ahmed Mahsa Ghasemi OffRL 44 0 0 20 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Kai Ye Hongyi Zhou Jin Zhu Francesco Quinzan C. Shi 20 0 0 03 Apr 2025
Evaluating and Aligning Human Economic Risk Preferences in LLMs J. Liu Yi Yang K. Tam 54 0 0 09 Mar 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen Jincheng Mei Katayoon Goshvadi Hanjun Dai Tong Yang Sherry Yang Dale Schuurmans Yuejie Chi Bo Dai OffRL 60 23 0 20 Feb 2025
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset Khaoula Chehbouni Jonathan Colaço-Carr Yash More Jackie CK Cheung G. Farnadi 71 0 0 12 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF Heyang Zhao Chenlu Ye Quanquan Gu Tong Zhang OffRL 50 3 0 07 Nov 2024
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning H. Fernando Han Shen Parikshit Ram Yi Zhou Horst Samulowitz Nathalie Baracaldo Tianyi Chen CLL 50 2 0 20 Oct 2024
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Shenao Zhang Zhihan Liu Boyi Liu Y. Zhang Yingxiang Yang Y. Liu Liyu Chen Tao Sun Z. Wang 87 2 0 10 Oct 2024
The Crucial Role of Samplers in Online Direct Preference Optimization Ruizhe Shi Runlong Zhou Simon S. Du 53 7 0 29 Sep 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback Arun Verma Zhongxiang Dai Xiaoqiang Lin P. Jaillet K. H. Low 19 5 0 24 Jul 2024
Preference Elicitation for Offline Reinforcement Learning Alizée Pace Bernhard Schölkopf Gunnar Rätsch Giorgia Ramponi OffRL 41 1 0 26 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis Qining Zhang Honghao Wei Lei Ying OffRL 38 1 0 11 Jun 2024
Robust Preference Optimization through Reward Model Distillation Adam Fisch Jacob Eisenstein Vicky Zayats Alekh Agarwal Ahmad Beirami Chirag Nagpal Peter Shaw Jonathan Berant 70 20 0 29 May 2024
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback Kihyun Kim Jiawei Zhang Asuman Ozdaglar P. Parrilo OffRL 18 1 0 20 May 2024
Comparisons Are All You Need for Optimizing Smooth Functions Chenyi Zhang Tongyang Li AAML 18 1 0 19 May 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback Ruitao Chen Liwei Wang 55 1 0 18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample Herman Bergström Emil Carlsson Devdatt Dubhashi Fredrik D. Johansson 20 0 0 05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback Qiwei Di Jiafan He Quanquan Gu 16 1 0 16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries Kaixuan Ji Jiafan He Quanquan Gu 8 17 0 14 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF Banghua Zhu Michael I. Jordan Jiantao Jiao 21 22 0 29 Jan 2024
Constructive Large Language Models Alignment with Diverse Feedback Tianshu Yu Ting-En Lin Yuchuan Wu Min Yang Fei Huang Yongbin Li ALM 30 8 0 10 Oct 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF Lingfeng Shen Sihao Chen Linfeng Song Lifeng Jin Baolin Peng Haitao Mi Daniel Khashabi Dong Yu 15 21 0 28 Sep 2023
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment Kevin Kaichuang Yang Dan Klein Asli Celikyilmaz Nanyun Peng Yuandong Tian ALM 17 31 0 24 Jul 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism Zihao Li Zhuoran Yang Mengdi Wang OffRL 14 51 0 29 May 2023
Reward Collapse in Aligning Large Language Models Ziang Song Tianle Cai Jason D. Lee Weijie J. Su ALM 18 22 0 28 May 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT Yihan Cao Siyu Li Yixin Liu Zhiling Yan Yutong Dai Philip S. Yu Lichao Sun 19 493 0 07 Mar 2023
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 225 495 0 28 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 216 327 0 23 Aug 2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation Xiaoyu Chen Han Zhong Zhuoran Yang Zhaoran Wang Liwei Wang 118 59 0 23 May 2022
Perspectives on Incorporating Expert Feedback into Model Updates Valerie Chen Umang Bhatt Hoda Heidari Adrian Weller Ameet Talwalkar 30 11 0 13 May 2022
Teaching language models to support answers with verified quotes Jacob Menick Maja Trebacz Vladimir Mikulik John Aslanides Francis Song ... Mia Glaese Susannah Young Lucy Campbell-Gillingham G. Irving Nat McAleese ELM RALM 220 204 0 21 Mar 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning Ilya Kostrikov Ashvin Nair Sergey Levine OffRL 206 627 0 12 Oct 2021
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 273 1,561 0 18 Sep 2019

Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons

Papers citing "Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons"

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$ -wise Comparisons