Sample Efficient Preference Alignment in LLMs via Active Exploration

Sample Efficient Preference Alignment in LLMs via Active Exploration

1 December 2023

Ilija Bogunovic

Ilija Bogunovic

Stefano Ermon

Jeff Schneider

Willie Neiswanger

Papers citing "Sample Efficient Preference Alignment in LLMs via Active Exploration"

7 / 7 papers shown

Title
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates Hui Wei Shenghua He Tian Xia Andy H. Wong Jingyang Lin Mei Han Mei Han ALM ELM 64 23 0 23 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback Arun Verma Zhongxiang Dai Xiaoqiang Lin P. Jaillet K. H. Low 37 5 0 24 Jul 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 313 11,953 0 04 Mar 2022
$Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information$ Understanding Dataset Difficulty with $\mathcal{V}$ -Usable Information Kawin Ethayarajh Yejin Choi Swabha Swayamdipta 167 157 0 16 Oct 2021
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 280 1,595 0 18 Sep 2019
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback Carolin (Haas) Lawrence Stefan Riezler OffRL 173 56 0 03 May 2018
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 285 9,138 0 06 Jun 2015