Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.08243
Cited By
Optimistic Policy Optimization with Bandit Feedback
19 February 2020
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimistic Policy Optimization with Bandit Feedback"
32 / 32 papers shown
Title
Decision Making in Hybrid Environments: A Model Aggregation Approach
Haolin Liu
Chen-Yu Wei
Julian Zimmert
86
0
0
09 Feb 2025
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
1
0
08 Jul 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
39
0
0
30 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
31
3
0
13 May 2024
Imitation Learning in Discounted Linear MDPs without exploration assumptions
Luca Viano
Stratis Skoulakis
V. Cevher
30
3
0
03 May 2024
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
35
26
0
15 May 2023
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Fang-yuan Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
26
12
0
14 Feb 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
32
12
0
30 Jan 2023
Refined Regret for Adversarial MDPs with Linear Function Approximation
Yan Dai
Haipeng Luo
Chen-Yu Wei
Julian Zimmert
31
12
0
30 Jan 2023
Eluder-based Regret for Stochastic Contextual MDPs
Orin Levy
Asaf B. Cassel
Alon Cohen
Yishay Mansour
33
5
0
27 Nov 2022
Regret Minimization and Convergence to Equilibria in General-sum Markov Games
Liad Erez
Tal Lancewicki
Uri Sherman
Tomer Koren
Yishay Mansour
42
25
0
28 Jul 2022
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions
Shuang Qiu
Xiaohan Wei
Jieping Ye
Zhaoran Wang
Zhuoran Yang
OffRL
32
11
0
25 Jul 2022
Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits
Qinghua Liu
Yuanhao Wang
Chi Jin
AAML
32
15
0
14 Mar 2022
Policy Optimization for Stochastic Shortest Path
Liyu Chen
Haipeng Luo
Aviv A. Rosenberg
19
12
0
07 Feb 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
Liyu Chen
R. Jain
Haipeng Luo
57
25
0
31 Jan 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
74
21
0
31 Jan 2022
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?
Han Zhong
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
31
30
0
27 Dec 2021
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Tianhao Wu
Yunchang Yang
Han Zhong
Liwei Wang
S. Du
Jiantao Jiao
55
14
0
21 Dec 2021
Differentially Private Regret Minimization in Episodic Markov Decision Processes
Sayak Ray Chowdhury
Xingyu Zhou
26
21
0
20 Dec 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
38
44
0
18 Jul 2021
Bellman-consistent Pessimism for Offline Reinforcement Learning
Tengyang Xie
Ching-An Cheng
Nan Jiang
Paul Mineiro
Alekh Agarwal
OffRL
LRM
27
271
0
13 Jun 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
32
52
0
24 Mar 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
95
24
0
17 Feb 2021
Online Apprenticeship Learning
Lior Shani
Tom Zahavy
Shie Mannor
OffRL
24
25
0
13 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration
N. Lazić
Botao Hao
Yasin Abbasi-Yadkori
Dale Schuurmans
Csaba Szepesvári
19
10
0
11 Feb 2021
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
38
32
0
29 Dec 2020
Policy Optimization as Online Learning with Mediator Feedback
Alberto Maria Metelli
Matteo Papini
P. DÓro
Marcello Restelli
OffRL
27
10
0
15 Dec 2020
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
Liyu Chen
Haipeng Luo
Chen-Yu Wei
29
32
0
07 Dec 2020
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang
Chi Jin
Zhaoran Wang
Mengdi Wang
Michael I. Jordan
39
18
0
09 Nov 2020
Sample Efficient Reinforcement Learning with REINFORCE
Junzi Zhang
Jongho Kim
Brendan O'Donoghue
Stephen P. Boyd
42
101
0
22 Oct 2020
Exploration-Exploitation in Constrained MDPs
Yonathan Efroni
Shie Mannor
Matteo Pirotta
33
171
0
04 Mar 2020
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
107
100
0
15 Oct 2019
1