Leveraging Demonstrations to Improve Online Learning: Quality Matters

7 February 2023

Papers citing "Leveraging Demonstrations to Improve Online Learning: Quality Matters"

7 / 7 papers shown

Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Kai Ye Hongyi Zhou Jin Zhu Francesco Quinzan C. Shi 25 1 0 03 Apr 2025
Online Bandit Learning with Offline Preference Data for Improved RLHF Akhil Agnihotri Rahul Jain Deepak Ramachandran Zheng Wen OffRL 37 2 0 13 Jun 2024
Sequential Best-Arm Identification with Application to Brain-Computer Interface Xiaoping Zhou Botao Hao Jian Kang Tor Lattimore Lexin Li 27 2 0 17 May 2023
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits Siddhartha Banerjee Sean R. Sinclair Milind Tambe Lily Xu C. Yu AI4TS 29 6 0 30 Sep 2022
Regret Bounds for Information-Directed Reinforcement Learning Botao Hao Tor Lattimore OffRL 39 17 0 09 Jun 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 313 11,915 0 04 Mar 2022
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 280 1,587 0 18 Sep 2019