Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.00490
Cited By
Online Markov Decision Processes with Aggregate Bandit Feedback
31 January 2021
Alon Cohen
Haim Kaplan
Tomer Koren
Yishay Mansour
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Online Markov Decision Processes with Aggregate Bandit Feedback"
8 / 8 papers shown
Title
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
194
0
0
12 May 2025
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
97
34
0
23 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
64
4
0
13 May 2024
Online Resource Allocation in Episodic Markov Decision Processes
Duksang Lee
William Overman
Dabeen Lee
58
1
0
18 May 2023
Dynamic Regret of Online Markov Decision Processes
Peng Zhao
Longfei Li
Zhi Zhou
OffRL
95
17
0
26 Aug 2022
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward
Tengyu Xu
Yue Wang
Shaofeng Zou
Yingbin Liang
OffRL
70
13
0
13 Jun 2022
Reinforcement Learning with a Terminator
Guy Tennenholtz
Nadav Merlis
Lior Shani
Shie Mannor
Uri Shalit
Gal Chechik
Assaf Hallak
Gal Dalal
59
5
0
30 May 2022
On the Theory of Reinforcement Learning with Once-per-Episode Feedback
Niladri S. Chatterji
Aldo Pacchiano
Peter L. Bartlett
Michael I. Jordan
OffRL
79
26
0
29 May 2021
1