Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.09311
Cited By
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
27 January 2019
Kefan Dong
Yuanhao Wang
Xiaoyu Chen
Liwei Wang
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP"
11 / 11 papers shown
Title
The Bandit Whisperer: Communication Learning for Restless Bandits
Yunfan Zhao
Tonghan Wang
Dheeraj M. Nagaraj
Aparna Taneja
Milind Tambe
81
5
0
11 Aug 2024
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang
Vinzenz Thoma
Zebang Shen
H. Nax
Niao He
78
2
0
14 Jul 2024
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
125
22
0
25 Jul 2023
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
52
801
0
10 Jul 2018
Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
Aaron Sidford
Mengdi Wang
X. Wu
Yinyu Ye
47
125
0
27 Oct 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
60
307
0
22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
65
771
0
16 Mar 2017
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
166
8,805
0
04 Feb 2016
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
245
6,722
0
19 Feb 2015
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
103
12,163
0
19 Dec 2013
PAC Bounds for Discounted MDPs
Tor Lattimore
Marcus Hutter
68
188
0
17 Feb 2012
1