ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.12923
  4. Cited By
Cautiously Optimistic Policy Optimization and Exploration with Linear
  Function Approximation

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

24 March 2021
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
ArXivPDFHTML

Papers citing "Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation"

48 / 48 papers shown
Title
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
S. Sarkar
40
0
0
21 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
77
44
0
31 Dec 2024
Dual Approximation Policy Optimization
Dual Approximation Policy Optimization
Zhihan Xiong
Maryam Fazel
Lin Xiao
15
1
0
02 Oct 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
64
1
0
22 Aug 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Shuang Qiu
Mladen Kolar
Tong Zhang
OffRL
25
0
0
10 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
0
0
08 Jul 2024
Linear Bellman Completeness Suffices for Efficient Online Reinforcement
  Learning with Few Actions
Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions
Noah Golowich
Ankur Moitra
OffRL
25
1
0
17 Jun 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
19
1
0
04 Mar 2024
Learning mirror maps in policy mirror descent
Learning mirror maps in policy mirror descent
Carlo Alfano
Sebastian Towers
Silvia Sapora
Chris Xiaoxuan Lu
Patrick Rebeschini
17
0
0
07 Feb 2024
Sample Complexity Characterization for Linear Contextual MDPs
Sample Complexity Characterization for Linear Contextual MDPs
Junze Deng
Yuan-Chia Cheng
Shaofeng Zou
Yingbin Liang
25
1
0
05 Feb 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
36
155
0
18 Dec 2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown
  Transition and Full-information Feedback
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
Canzhe Zhao
Ruofeng Yang
Baoxiang Wang
Xuezhou Zhang
Shuai Li
22
2
0
14 Nov 2023
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Haolin Liu
Chen-Yu Wei
Julian Zimmert
13
6
0
17 Oct 2023
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Uri Sherman
Alon Cohen
Tomer Koren
Yishay Mansour
23
7
0
28 Aug 2023
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
Yuan-Chia Cheng
J. Yang
Yitao Liang
OOD
25
1
0
10 Aug 2023
Provably Efficient UCB-type Algorithms For Learning Predictive State
  Representations
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations
Ruiquan Huang
Yitao Liang
J. Yang
OffRL
16
5
0
01 Jul 2023
Provably Efficient Representation Learning with Tractable Planning in
  Low-Rank POMDP
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP
Jiacheng Guo
Zihao Li
Huazheng Wang
Mengdi Wang
Zhuoran Yang
Xuezhou Zhang
25
5
0
21 Jun 2023
On the Model-Misspecification in Reinforcement Learning
On the Model-Misspecification in Reinforcement Learning
Yunfan Li
Lin F. Yang
28
5
0
19 Jun 2023
Low-Switching Policy Gradient with Exploration via Online Sensitivity
  Sampling
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
Yunfan Li
Yiran Wang
Y. Cheng
Lin F. Yang
OffRL
16
4
0
15 Jun 2023
Improving Offline RL by Blending Heuristics
Improving Offline RL by Blending Heuristics
Sinong Geng
Aldo Pacchiano
Andrey Kolobov
Ching-An Cheng
OffRL
17
7
0
01 Jun 2023
Optimistic Natural Policy Gradient: a Simple Efficient Policy
  Optimization Framework for Online RL
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
Qinghua Liu
Gellert Weisz
András Gyorgy
Chi Jin
Csaba Szepesvári
OffRL
16
8
0
18 May 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in
  Linear Markov Decision Processes
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
17
26
0
15 May 2023
Improved Sample Complexity for Reward-free Reinforcement Learning under
  Low-rank MDPs
Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs
Yuan-Chia Cheng
Ruiquan Huang
J. Yang
Yitao Liang
OffRL
37
8
0
20 Mar 2023
Best of Both Worlds Policy Optimization
Best of Both Worlds Policy Optimization
Christoph Dann
Chen-Yu Wei
Julian Zimmert
11
12
0
18 Feb 2023
A Novel Framework for Policy Mirror Descent with General
  Parameterization and Linear Convergence
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano
Rui Yuan
Patrick Rebeschini
54
15
0
30 Jan 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear
  Function Approximation
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
9
12
0
30 Jan 2023
Refined Regret for Adversarial MDPs with Linear Function Approximation
Refined Regret for Adversarial MDPs with Linear Function Approximation
Yan Dai
Haipeng Luo
Chen-Yu Wei
Julian Zimmert
16
12
0
30 Jan 2023
Sample Efficient Deep Reinforcement Learning via Local Planning
Sample Efficient Deep Reinforcement Learning via Local Planning
Dong Yin
S. Thiagarajan
N. Lazić
Nived Rajaraman
Botao Hao
Csaba Szepesvári
13
4
0
29 Jan 2023
Latent Variable Representation for Reinforcement Learning
Latent Variable Representation for Reinforcement Learning
Tongzheng Ren
Chenjun Xiao
Tianjun Zhang
Na Li
Zhaoran Wang
Sujay Sanghavi
Dale Schuurmans
Bo Dai
OffRL
11
10
0
17 Dec 2022
CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control
Xiang Zheng
Xingjun Ma
Cong Wang
16
1
0
28 Nov 2022
Linear Convergence for Natural Policy Gradient with Log-linear Policy
  Parametrization
Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization
Carlo Alfano
Patrick Rebeschini
49
13
0
30 Sep 2022
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning
  in Online Reinforcement Learning
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
Shuang Qiu
Lingxiao Wang
Chenjia Bai
Zhuoran Yang
Zhaoran Wang
SSL
OffRL
8
32
0
29 Jul 2022
Convergence and sample complexity of natural policy gradient primal-dual
  methods for constrained MDPs
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs
Dongsheng Ding
K. Zhang
Jiali Duan
Tamer Bacsar
Mihailo R. Jovanović
13
19
0
06 Jun 2022
Stabilizing Q-learning with Linear Architectures for Provably Efficient
  Learning
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning
Andrea Zanette
Martin J. Wainwright
OOD
19
5
0
01 Jun 2022
Provable Benefits of Representational Transfer in Reinforcement Learning
Provable Benefits of Representational Transfer in Reinforcement Learning
Alekh Agarwal
Yuda Song
Wen Sun
Kaiwen Wang
Mengdi Wang
Xuezhou Zhang
OffRL
16
33
0
29 May 2022
Efficient Reinforcement Learning in Block MDPs: A Model-free
  Representation Learning Approach
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Xuezhou Zhang
Yuda Song
Masatoshi Uehara
Mengdi Wang
Alekh Agarwal
Wen Sun
OffRL
11
57
0
31 Jan 2022
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Tianhao Wu
Yunchang Yang
Han Zhong
Liwei Wang
S. Du
Jiantao Jiao
23
14
0
21 Dec 2021
On the Global Optimum Convergence of Momentum-based Policy Gradient
On the Global Optimum Convergence of Momentum-based Policy Gradient
Yuhao Ding
Junzi Zhang
Javad Lavaei
17
16
0
19 Oct 2021
Representation Learning for Online and Offline RL in Low-rank MDPs
Representation Learning for Online and Offline RL in Low-rank MDPs
Masatoshi Uehara
Xuezhou Zhang
Wen Sun
OffRL
25
125
0
09 Oct 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement
  Learning
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
11
111
0
19 Aug 2021
Efficient Local Planning with Linear Function Approximation
Efficient Local Planning with Linear Function Approximation
Dong Yin
Botao Hao
Yasin Abbasi-Yadkori
N. Lazić
Csaba Szepesvári
27
18
0
12 Aug 2021
Design of Experiments for Stochastic Contextual Linear Bandits
Design of Experiments for Stochastic Contextual Linear Bandits
Andrea Zanette
Kefan Dong
Jonathan Lee
Emma Brunskill
OffRL
17
17
0
21 Jul 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via
  Dilated Bonuses
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
14
44
0
18 Jul 2021
MADE: Exploration via Maximizing Deviation from Explored Regions
MADE: Exploration via Maximizing Deviation from Explored Regions
Tianjun Zhang
Paria Rashidinejad
Jiantao Jiao
Yuandong Tian
Joseph E. Gonzalez
Stuart J. Russell
OffRL
13
42
0
18 Jun 2021
Corruption-Robust Offline Reinforcement Learning
Corruption-Robust Offline Reinforcement Learning
Xuezhou Zhang
Yiding Chen
Jerry Zhu
Wen Sun
OffRL
18
39
0
11 Jun 2021
Navigating to the Best Policy in Markov Decision Processes
Navigating to the Best Policy in Markov Decision Processes
Aymen Al Marjani
Aurélien Garivier
Alexandre Proutière
14
20
0
05 Jun 2021
Provably Efficient Reinforcement Learning with Linear Function
  Approximation Under Adaptivity Constraints
Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints
Chi Jin
Zhuoran Yang
Zhaoran Wang
OffRL
107
166
0
06 Jan 2021
Optimism in Reinforcement Learning with Generalized Linear Function
  Approximation
Optimism in Reinforcement Learning with Generalized Linear Function Approximation
Yining Wang
Ruosong Wang
S. Du
A. Krishnamurthy
127
135
0
09 Dec 2019
1