ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.05830
  4. Cited By
Provably Efficient Exploration in Policy Optimization
v1v2v3v4 (latest)

Provably Efficient Exploration in Policy Optimization

International Conference on Machine Learning (ICML), 2019
12 December 2019
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
ArXiv (abs)PDFHTML

Papers citing "Provably Efficient Exploration in Policy Optimization"

50 / 217 papers shown
Greedy Sampling Is Provably Efficient for RLHF
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
148
2
0
28 Oct 2025
On the Sample Complexity of Differentially Private Policy Optimization
On the Sample Complexity of Differentially Private Policy Optimization
Yi He
Xingyu Zhou
164
0
0
24 Oct 2025
Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
Xiaoyun Zhang
Xiaojian Yuan
Di Huang
Wang You
Chen-Hao Hu
Jingqing Ruan
Kejiang Chen
Xing Hu
Xing Hu
LRM
236
3
0
13 Oct 2025
Embracing Evolution: A Call for Body-Control Co-Design in Embodied Humanoid Robot
Embracing Evolution: A Call for Body-Control Co-Design in Embodied Humanoid Robot
Guiliang Liu
Bo Yue
Yi Jin Kim
Kui Jia
190
1
0
03 Oct 2025
Trajectory Data Suffices for Statistically Efficient Policy Evaluation in Finite-Horizon Offline RL with Linear $q^π$-Realizability and Concentrability
Trajectory Data Suffices for Statistically Efficient Policy Evaluation in Finite-Horizon Offline RL with Linear qπq^πqπ-Realizability and Concentrability
Volodymyr Tkachuk
Csaba Szepesvári
Xiaoqi Tan
OffRL
145
0
0
03 Oct 2025
Sampling Complexity of TD and PPO in RKHS
Sampling Complexity of TD and PPO in RKHS
Lu Zou
Wendi Ren
Weizhong Zhang
Liang Ding
Shuang Li
156
1
0
29 Sep 2025
Replicable Reinforcement Learning with Linear Function Approximation
Replicable Reinforcement Learning with Linear Function Approximation
Eric Eaton
Marcel Hussing
Michael Kearns
Aaron Roth
S. B. Sengupta
Jessica Sorrell
246
3
0
10 Sep 2025
Reasoning with Exploration: An Entropy Perspective
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng
Shaohan Huang
Xuekai Zhu
Bo Dai
Wayne Xin Zhao
Zhenliang Zhang
Furu Wei
LRM
395
201
0
17 Jun 2025
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
Jiachen Hu
Rui Ai
Han Zhong
Xiaoyu Chen
L. Wang
Zhaoran Wang
Zhuoran Yang
249
0
0
11 Jun 2025
Linear Mixture Distributionally Robust Markov Decision Processes
Linear Mixture Distributionally Robust Markov Decision Processes
Zhishuai Liu
Pan Xu
370
6
0
23 May 2025
CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning
CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning
Yexin Li
OffRL
479
2
0
23 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Adversarial Policy Optimization for Offline Preference-based Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2025
Hyungkyu Kang
Min-hwan Oh
OffRL
445
3
0
07 Mar 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
477
4
0
13 Feb 2025
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits
Towards a Sharp Analysis of Offline Policy Learning for fff-Divergence-Regularized Contextual Bandits
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
495
0
0
09 Feb 2025
Online MDP with Transition Prototypes: A Robust Adaptive Approach
Online MDP with Transition Prototypes: A Robust Adaptive Approach
Shuo Sun
Meng Qi
Z. Shen
344
0
0
18 Dec 2024
Accelerating Proximal Policy Optimization Learning Using Task Prediction
  for Solving Environments with Delayed Rewards
Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed RewardsConference on Learning for Dynamics & Control (L4DC), 2024
A. Ahmad
Mehdi Kermanshah
Kevin J. Leahy
Zachary Serlin
H. Siu
Makai Mann
C. Vasile
Roberto Tron
C. Belta
OffRL
369
0
0
26 Nov 2024
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPsNeural Information Processing Systems (NeurIPS), 2024
Long-Fei Li
Peng Zhao
Zhi Zhou
283
4
0
05 Nov 2024
Demystifying Linear MDPs and Novel Dynamics Aggregation Framework
Demystifying Linear MDPs and Novel Dynamics Aggregation FrameworkInternational Conference on Learning Representations (ICLR), 2024
Joongkyu Lee
Min-hwan Oh
343
5
0
31 Oct 2024
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded
  Span
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded SpanInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Woojin Chae
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
208
1
0
19 Oct 2024
Choices are More Important than Efforts: LLM Enables Efficient
  Multi-Agent Exploration
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu
Boyuan Wang
Yuhang Jiang
Jianzhun Shao
Yixiu Mao
Cheems Wang
Chang Liu
Xiangyang Ji
401
12
0
03 Oct 2024
Dual Approximation Policy Optimization
Dual Approximation Policy Optimization
Zhihan Xiong
Maryam Fazel
Lin Xiao
286
1
0
02 Oct 2024
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement
  Learning
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement LearningConference on Learning for Dynamics & Control (L4DC), 2024
Batuhan Yardim
Niao He
AI4CE
294
7
0
27 Aug 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
587
1
0
22 Aug 2024
Misspecified $Q$-Learning with Sparse Linear Function Approximation:
  Tight Bounds on Approximation Error
Misspecified QQQ-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
Ally Yalei Du
Lin F. Yang
Ruosong Wang
237
0
0
18 Jul 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Delin Qu
Mladen Kolar
Tong Zhang
OffRL
294
3
0
10 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Jean-Michel Poggi
390
1
0
08 Jul 2024
Warm-up Free Policy Optimization: Improved Regret in Linear Markov
  Decision Processes
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
367
5
0
03 Jul 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline
  Alignment for Language Models
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
387
12
0
06 Jun 2024
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Haotian Hu
Yiqin Yang
Jianing Ye
Chengjie Wu
Ziqing Mai
Yujing Hu
Tangjie Lv
Changjie Fan
Qianchuan Zhao
Chongjie Zhang
OffRLOnRL
318
9
0
31 May 2024
Mollification Effects of Policy Gradient Methods
Mollification Effects of Policy Gradient Methods
Tao Wang
Sylvia Herbert
Sicun Gao
316
2
0
28 May 2024
Trajectory Data Suffices for Statistically Efficient Learning in Offline
  RL with Linear $q^π$-Realizability and Concentrability
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear qπq^πqπ-Realizability and Concentrability
Volodymyr Tkachuk
Gellert Weisz
Csaba Szepesvári
OffRL
242
3
0
27 May 2024
Provably Efficient Off-Policy Adversarial Imitation Learning with
  Convergence Guarantees
Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees
Yilei Chen
Vittorio Giammarino
James Queeney
I. Paschalidis
284
1
0
26 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Near-Optimal Regret in Linear MDPs with Aggregate Bandit FeedbackInternational Conference on Machine Learning (ICML), 2024
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
383
6
0
13 May 2024
Imitation Learning in Discounted Linear MDPs without exploration
  assumptions
Imitation Learning in Discounted Linear MDPs without exploration assumptionsInternational Conference on Machine Learning (ICML), 2024
Luca Viano
Stratis Skoulakis
Volkan Cevher
358
9
0
03 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
785
119
0
29 Apr 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with
  General Function Approximation
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
Jianliang He
Han Zhong
Zhuoran Yang
356
6
0
19 Apr 2024
Prior-dependent analysis of posterior sampling reinforcement learning
  with function approximation
Prior-dependent analysis of posterior sampling reinforcement learning with function approximation
Yingru Li
Zhi-Quan Luo
232
0
0
17 Mar 2024
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
  Feedback and Unknown Transition
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown TransitionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Long-Fei Li
Peng Zhao
Zhi Zhou
375
7
0
07 Mar 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
269
4
0
04 Mar 2024
Provable Risk-Sensitive Distributional Reinforcement Learning with
  General Function Approximation
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Yu Chen
Xiangcheng Zhang
Siwei Wang
Longbo Huang
415
3
0
28 Feb 2024
Truly No-Regret Learning in Constrained MDPs
Truly No-Regret Learning in Constrained MDPs
Adrian Müller
Pragnya Alatur
Volkan Cevher
Giorgia Ramponi
Niao He
446
17
0
24 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
524
40
0
14 Feb 2024
Rethinking Model-based, Policy-based, and Value-based Reinforcement
  Learning via the Lens of Representation Complexity
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation ComplexityNeural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Han Zhong
OffRL
318
5
0
28 Dec 2023
Optimistic Policy Gradient in Multi-Player Markov Games with a Single
  Controller: Convergence Beyond the Minty Property
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property
Ioannis Anagnostides
Ioannis Panageas
Gabriele Farina
Tuomas Sandholm
394
3
0
19 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
505
332
0
18 Dec 2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown
  Transition and Full-information Feedback
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information FeedbackNeural Information Processing Systems (NeurIPS), 2023
Canzhe Zhao
Ruofeng Yang
Baoxiang Wang
Xuezhou Zhang
Shuai Li
297
4
0
14 Nov 2023
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement
  Learning
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Ahmadreza Moradipari
M. Pedramfar
Modjtaba Shokrian Zini
Vaneet Aggarwal
339
6
0
30 Oct 2023
Posterior Sampling with Delayed Feedback for Reinforcement Learning with
  Linear Function Approximation
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function ApproximationNeural Information Processing Systems (NeurIPS), 2023
Nikki Lijing Kuang
Ming Yin
Mengdi Wang
Yu Wang
Yian Ma
365
7
0
29 Oct 2023
Unsupervised Behavior Extraction via Random Intent Priors
Unsupervised Behavior Extraction via Random Intent PriorsNeural Information Processing Systems (NeurIPS), 2023
Haotian Hu
Yiqin Yang
Jianing Ye
Ziqing Mai
Chongjie Zhang
OffRL
318
15
0
28 Oct 2023
A Doubly Robust Approach to Sparse Reinforcement Learning
A Doubly Robust Approach to Sparse Reinforcement LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Wonyoung Hedge Kim
Garud Iyengar
A. Zeevi
246
5
0
23 Oct 2023
12345
Next
Page 1 of 5