ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.07637
  4. Cited By
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

13 May 2024
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
    OffRL
ArXivPDFHTML

Papers citing "Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback"

6 / 6 papers shown
Title
Warm-up Free Policy Optimization: Improved Regret in Linear Markov
  Decision Processes
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
24
1
0
03 Jul 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
45
23
0
23 May 2024
Model-free Posterior Sampling via Learning Rate Randomization
Model-free Posterior Sampling via Learning Rate Randomization
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
Alexey Naumov
Pierre Perrault
Michal Valko
Pierre Menard
OffRL
15
3
0
27 Oct 2023
Human-in-the-loop: Provably Efficient Preference-based Reinforcement
  Learning with General Function Approximation
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
118
60
0
23 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
61
21
0
31 Jan 2022
1