Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1912.05830
Cited By
v1
v2
v3
v4 (latest)
Provably Efficient Exploration in Policy Optimization
International Conference on Machine Learning (ICML), 2019
12 December 2019
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Provably Efficient Exploration in Policy Optimization"
50 / 217 papers shown
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
148
2
0
28 Oct 2025
On the Sample Complexity of Differentially Private Policy Optimization
Yi He
Xingyu Zhou
164
0
0
24 Oct 2025
Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
Xiaoyun Zhang
Xiaojian Yuan
Di Huang
Wang You
Chen-Hao Hu
Jingqing Ruan
Kejiang Chen
Xing Hu
Xing Hu
LRM
236
3
0
13 Oct 2025
Embracing Evolution: A Call for Body-Control Co-Design in Embodied Humanoid Robot
Guiliang Liu
Bo Yue
Yi Jin Kim
Kui Jia
190
1
0
03 Oct 2025
Trajectory Data Suffices for Statistically Efficient Policy Evaluation in Finite-Horizon Offline RL with Linear
q
π
q^π
q
π
-Realizability and Concentrability
Volodymyr Tkachuk
Csaba Szepesvári
Xiaoqi Tan
OffRL
145
0
0
03 Oct 2025
Sampling Complexity of TD and PPO in RKHS
Lu Zou
Wendi Ren
Weizhong Zhang
Liang Ding
Shuang Li
156
1
0
29 Sep 2025
Replicable Reinforcement Learning with Linear Function Approximation
Eric Eaton
Marcel Hussing
Michael Kearns
Aaron Roth
S. B. Sengupta
Jessica Sorrell
246
3
0
10 Sep 2025
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng
Shaohan Huang
Xuekai Zhu
Bo Dai
Wayne Xin Zhao
Zhenliang Zhang
Furu Wei
LRM
395
201
0
17 Jun 2025
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
Jiachen Hu
Rui Ai
Han Zhong
Xiaoyu Chen
L. Wang
Zhaoran Wang
Zhuoran Yang
249
0
0
11 Jun 2025
Linear Mixture Distributionally Robust Markov Decision Processes
Zhishuai Liu
Pan Xu
370
6
0
23 May 2025
CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning
Yexin Li
OffRL
479
2
0
23 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
International Conference on Learning Representations (ICLR), 2025
Hyungkyu Kang
Min-hwan Oh
OffRL
445
3
0
07 Mar 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
477
4
0
13 Feb 2025
Towards a Sharp Analysis of Offline Policy Learning for
f
f
f
-Divergence-Regularized Contextual Bandits
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
495
0
0
09 Feb 2025
Online MDP with Transition Prototypes: A Robust Adaptive Approach
Shuo Sun
Meng Qi
Z. Shen
344
0
0
18 Dec 2024
Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards
Conference on Learning for Dynamics & Control (L4DC), 2024
A. Ahmad
Mehdi Kermanshah
Kevin J. Leahy
Zachary Serlin
H. Siu
Makai Mann
C. Vasile
Roberto Tron
C. Belta
OffRL
369
0
0
26 Nov 2024
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
Neural Information Processing Systems (NeurIPS), 2024
Long-Fei Li
Peng Zhao
Zhi Zhou
283
4
0
05 Nov 2024
Demystifying Linear MDPs and Novel Dynamics Aggregation Framework
International Conference on Learning Representations (ICLR), 2024
Joongkyu Lee
Min-hwan Oh
343
5
0
31 Oct 2024
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Woojin Chae
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
208
1
0
19 Oct 2024
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu
Boyuan Wang
Yuhang Jiang
Jianzhun Shao
Yixiu Mao
Cheems Wang
Chang Liu
Xiangyang Ji
401
12
0
03 Oct 2024
Dual Approximation Policy Optimization
Zhihan Xiong
Maryam Fazel
Lin Xiao
286
1
0
02 Oct 2024
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning
Conference on Learning for Dynamics & Control (L4DC), 2024
Batuhan Yardim
Niao He
AI4CE
294
7
0
27 Aug 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
587
1
0
22 Aug 2024
Misspecified
Q
Q
Q
-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
Ally Yalei Du
Lin F. Yang
Ruosong Wang
237
0
0
18 Jul 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Delin Qu
Mladen Kolar
Tong Zhang
OffRL
294
3
0
10 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Jean-Michel Poggi
390
1
0
08 Jul 2024
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
367
5
0
03 Jul 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
387
12
0
06 Jun 2024
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Haotian Hu
Yiqin Yang
Jianing Ye
Chengjie Wu
Ziqing Mai
Yujing Hu
Tangjie Lv
Changjie Fan
Qianchuan Zhao
Chongjie Zhang
OffRL
OnRL
318
9
0
31 May 2024
Mollification Effects of Policy Gradient Methods
Tao Wang
Sylvia Herbert
Sicun Gao
316
2
0
28 May 2024
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear
q
π
q^π
q
π
-Realizability and Concentrability
Volodymyr Tkachuk
Gellert Weisz
Csaba Szepesvári
OffRL
242
3
0
27 May 2024
Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees
Yilei Chen
Vittorio Giammarino
James Queeney
I. Paschalidis
284
1
0
26 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
International Conference on Machine Learning (ICML), 2024
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
383
6
0
13 May 2024
Imitation Learning in Discounted Linear MDPs without exploration assumptions
International Conference on Machine Learning (ICML), 2024
Luca Viano
Stratis Skoulakis
Volkan Cevher
358
9
0
03 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
785
119
0
29 Apr 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
Jianliang He
Han Zhong
Zhuoran Yang
356
6
0
19 Apr 2024
Prior-dependent analysis of posterior sampling reinforcement learning with function approximation
Yingru Li
Zhi-Quan Luo
232
0
0
17 Mar 2024
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Long-Fei Li
Peng Zhao
Zhi Zhou
375
7
0
07 Mar 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
269
4
0
04 Mar 2024
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Yu Chen
Xiangcheng Zhang
Siwei Wang
Longbo Huang
415
3
0
28 Feb 2024
Truly No-Regret Learning in Constrained MDPs
Adrian Müller
Pragnya Alatur
Volkan Cevher
Giorgia Ramponi
Niao He
446
17
0
24 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
524
40
0
14 Feb 2024
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
Neural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Han Zhong
OffRL
318
5
0
28 Dec 2023
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property
Ioannis Anagnostides
Ioannis Panageas
Gabriele Farina
Tuomas Sandholm
394
3
0
19 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
505
332
0
18 Dec 2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
Neural Information Processing Systems (NeurIPS), 2023
Canzhe Zhao
Ruofeng Yang
Baoxiang Wang
Xuezhou Zhang
Shuai Li
297
4
0
14 Nov 2023
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2023
Ahmadreza Moradipari
M. Pedramfar
Modjtaba Shokrian Zini
Vaneet Aggarwal
339
6
0
30 Oct 2023
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Neural Information Processing Systems (NeurIPS), 2023
Nikki Lijing Kuang
Ming Yin
Mengdi Wang
Yu Wang
Yian Ma
365
7
0
29 Oct 2023
Unsupervised Behavior Extraction via Random Intent Priors
Neural Information Processing Systems (NeurIPS), 2023
Haotian Hu
Yiqin Yang
Jianing Ye
Ziqing Mai
Chongjie Zhang
OffRL
318
15
0
28 Oct 2023
A Doubly Robust Approach to Sparse Reinforcement Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Wonyoung Hedge Kim
Garud Iyengar
A. Zeevi
246
5
0
23 Oct 2023
1
2
3
4
5
Next
Page 1 of 5