ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.13890
  4. Cited By
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
  Optimal Sample Complexity

Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

28 February 2022
Laixi Shi
Gen Li
Yuting Wei
Yuxin Chen
Yuejie Chi
    OffRL
ArXivPDFHTML

Papers citing "Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity"

20 / 20 papers shown
Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
60
23
0
20 Feb 2025
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
70
2
0
10 Oct 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
45
1
0
11 Jun 2024
What Are the Odds? Improving the foundations of Statistical Model Checking
What Are the Odds? Improving the foundations of Statistical Model Checking
Tobias Meggendorfer
Maximilian Weininger
Patrick Wienhoft
19
4
0
08 Apr 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy
  Coverage Suffices
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
24
3
0
08 Feb 2024
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Mao Hong
Zhiyue Zhang
Yue Wu
Yan Xu
OffRL
39
0
0
21 Jan 2024
Settling the Sample Complexity of Online Reinforcement Learning
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
90
21
0
25 Jul 2023
Offline Meta Reinforcement Learning with In-Distribution Online
  Adaptation
Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
Jianhao Wang
Jin Zhang
Haozhe Jiang
Junyu Zhang
Liwei Wang
Chongjie Zhang
OffRL
19
9
0
31 May 2023
High-probability sample complexities for policy evaluation with linear
  function approximation
High-probability sample complexities for policy evaluation with linear function approximation
Gen Li
Weichen Wu
Yuejie Chi
Cong Ma
Alessandro Rinaldo
Yuting Wei
OffRL
18
6
0
30 May 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
  Pessimism
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
29
52
0
29 May 2023
Double Pessimism is Provably Efficient for Distributionally Robust
  Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Jose H. Blanchet
Miao Lu
Tong Zhang
Han Zhong
OffRL
37
29
0
16 May 2023
Offline Learning in Markov Games with General Function Approximation
Offline Learning in Markov Games with General Function Approximation
Yuheng Zhang
Yunru Bai
Nan Jiang
OffRL
8
8
0
06 Feb 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
13
5
0
05 Feb 2023
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
Fan Chen
Junyu Zhang
Zaiwen Wen
OffRL
18
8
0
13 Jul 2022
Stabilizing Q-learning with Linear Architectures for Provably Efficient
  Learning
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning
Andrea Zanette
Martin J. Wainwright
OOD
23
5
0
01 Jun 2022
Non-Markovian policies occupancy measures
Non-Markovian policies occupancy measures
Romain Laroche
Rémi Tachet des Combes
Jacob Buckman
OffRL
14
1
0
27 May 2022
Near-optimal Offline Reinforcement Learning with Linear Representation:
  Leveraging Variance Information with Pessimism
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
Ming Yin
Yaqi Duan
Mengdi Wang
Yu-Xiang Wang
OffRL
19
65
0
11 Mar 2022
Pessimistic Model-based Offline Reinforcement Learning under Partial
  Coverage
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
Masatoshi Uehara
Wen Sun
OffRL
91
144
0
13 Jul 2021
COMBO: Conservative Offline Model-Based Policy Optimization
COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu
Aviral Kumar
Rafael Rafailov
Aravind Rajeswaran
Sergey Levine
Chelsea Finn
OffRL
214
413
0
16 Feb 2021
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on
  Open Problems
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
329
1,944
0
04 May 2020
1