Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.07637
Cited By
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
13 May 2024
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback"
6 / 6 papers shown
Title
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
24
1
0
03 Jul 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
45
23
0
23 May 2024
Model-free Posterior Sampling via Learning Rate Randomization
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
Alexey Naumov
Pierre Perrault
Michal Valko
Pierre Menard
OffRL
15
3
0
27 Oct 2023
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
118
60
0
23 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
61
21
0
31 Jan 2022
1