Sample Complexity of Preference-Based Nonparametric Off-Policy
Evaluation with Deep Networks

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

16 October 2023

Mengdi Wang

Papers citing "Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks"

6 / 6 papers shown

Title
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient Ming Yin Mengdi Wang Yu-Xiang Wang OffRL 43 11 0 03 Oct 2022
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 225 495 0 28 Sep 2022
Teaching language models to support answers with verified quotes Jacob Menick Maja Trebacz Vladimir Mikulik John Aslanides Francis Song ... Mia Glaese Susannah Young Lucy Campbell-Gillingham G. Irving Nat McAleese ELM RALM 235 255 0 21 Mar 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,730 0 04 Mar 2022
The Intrinsic Dimension of Images and Its Impact on Learning Phillip E. Pope Chen Zhu Ahmed Abdelkader Micah Goldblum Tom Goldstein 189 256 0 18 Apr 2021
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems Sergey Levine Aviral Kumar George Tucker Justin Fu OffRL GP 329 1,944 0 04 May 2020