Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10500
Cited By
Active Preference Optimization for Sample Efficient RLHF
16 February 2024
Nirjhar Das
Souradip Chakraborty
Aldo Pacchiano
Sayak Ray Chowdhury
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Active Preference Optimization for Sample Efficient RLHF"
10 / 10 papers shown
Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
108
2
0
29 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
20
0
0
03 Apr 2025
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
24
4
0
19 Jul 2024
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury
Anush Kini
Nagarajan Natarajan
22
54
0
01 Mar 2024
Active Preference Learning for Large Language Models
William Muldrew
Peter Hayes
Mingtian Zhang
David Barber
26
16
0
12 Feb 2024
Sample Efficient Preference Alignment in LLMs via Active Exploration
Viraj Mehta
Vikramjeet Das
Ojash Neopane
Yijia Dai
Ilija Bogunovic
Ilija Bogunovic
W. Neiswanger
Stefano Ermon
Jeff Schneider
Willie Neiswanger
OffRL
25
12
0
01 Dec 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
118
59
0
23 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits
Marc Abeille
Louis Faury
Clément Calauzènes
96
37
0
23 Oct 2020
1