ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.06799
  4. Cited By
The Importance of Pessimism in Fixed-Dataset Policy Optimization

The Importance of Pessimism in Fixed-Dataset Policy Optimization

15 September 2020
Jacob Buckman
Carles Gelada
Marc G. Bellemare
    OffRL
ArXivPDFHTML

Papers citing "The Importance of Pessimism in Fixed-Dataset Policy Optimization"

50 / 90 papers shown
Title
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
73
0
0
10 Mar 2025
Choices are More Important than Efforts: LLM Enables Efficient
  Multi-Agent Exploration
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu
Boyuan Wang
Yuhang Jiang
Jianzhun Shao
Yixiu Mao
Cheems Wang
Chang Liu
Xiangyang Ji
46
4
0
03 Oct 2024
Hokoff: Real Game Dataset from Honor of Kings and its Offline
  Reinforcement Learning Benchmarks
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks
Yun Qu
Boyuan Wang
Jianzhun Shao
Yuhang Jiang
Chen Chen
...
Qiang Fu
Wei Yang
Guang Yang
Lanxiao Huang
Xiangyang Ji
OffRL
46
9
0
20 Aug 2024
Combining Experimental and Historical Data for Policy Evaluation
Combining Experimental and Historical Data for Policy Evaluation
Ting Li
Chengchun Shi
Qianglin Wen
Yang Sui
Yongli Qin
Chunbo Lai
Hongtu Zhu
OffRL
46
0
0
01 Jun 2024
Cross-Validated Off-Policy Evaluation
Cross-Validated Off-Policy Evaluation
Matej Cief
B. Kveton
Michal Kompan
OffRL
20
1
0
24 May 2024
Offline Multi-task Transfer RL with Representational Penalization
Offline Multi-task Transfer RL with Representational Penalization
Avinandan Bose
S. S. Du
Maryam Fazel
OffRL
49
12
0
19 Feb 2024
Long-term Safe Reinforcement Learning with Binary Feedback
Long-term Safe Reinforcement Learning with Binary Feedback
Akifumi Wachi
Wataru Hashimoto
Kazumune Hashimoto
OffRL
25
3
0
08 Jan 2024
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement
  Learning?
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?
Gunshi Gupta
Tim G. J. Rudner
R. McAllister
Adrien Gaidon
Y. Gal
OffRL
48
3
0
28 Dec 2023
Bridging Distributionally Robust Learning and Offline RL: An Approach to
  Mitigate Distribution Shift and Partial Data Coverage
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
Kishan Panaganti
Zaiyan Xu
D. Kalathil
Mohammad Ghavamzadeh
OOD
OffRL
34
6
0
27 Oct 2023
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
Noveen Sachdeva
Lequn Wang
Dawen Liang
Nathan Kallus
Julian McAuley
OffRL
30
12
0
24 Oct 2023
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
  Datasets
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
Zhang-Wei Hong
Aviral Kumar
Sathwik Karnik
Abhishek Bhandwaldar
Akash Srivastava
J. Pajarinen
Romain Laroche
Abhishek Gupta
Pulkit Agrawal
OffRL
38
19
0
06 Oct 2023
Counterfactual Conservative Q Learning for Offline Multi-agent
  Reinforcement Learning
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Jianzhun Shao
Yun Qu
Chen Chen
Hongchang Zhang
Xiangyang Ji
OffRL
13
19
0
22 Sep 2023
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with
  Expert Guidance
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance
Qisen Yang
Shenzhi Wang
Qihang Zhang
Gao Huang
Shiji Song
OffRL
OnRL
24
8
0
04 Sep 2023
Offline Reinforcement Learning with On-Policy Q-Function Regularization
Offline Reinforcement Learning with On-Policy Q-Function Regularization
Laixi Shi
Robert Dadashi
Yuejie Chi
P. S. Castro
M. Geist
OffRL
27
5
0
25 Jul 2023
Provable Benefits of Policy Learning from Human Preferences in
  Contextual Bandit Problems
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems
Xiang Ji
Huazheng Wang
Minshuo Chen
Tuo Zhao
Mengdi Wang
OffRL
32
6
0
24 Jul 2023
Bayesian Safe Policy Learning with Chance Constrained Optimization:
  Application to Military Security Assessment during the Vietnam War
Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War
Zeyang Jia
Eli Ben-Michael
Kosuke Imai
24
4
0
17 Jul 2023
Probabilistic Counterexample Guidance for Safer Reinforcement Learning
  (Extended Version)
Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)
Xiaotong Ji
Antonio Filieri
OffRL
15
1
0
10 Jul 2023
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory
  Weighting
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
Zhang-Wei Hong
Pulkit Agrawal
Rémi Tachet des Combes
Romain Laroche
OffRL
29
17
0
22 Jun 2023
Unified Off-Policy Learning to Rank: a Reinforcement Learning
  Perspective
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Zeyu Zhang
Yi-Hsun Su
Hui Yuan
Yiran Wu
R. Balasubramanian
Qingyun Wu
Huazheng Wang
Mengdi Wang
OffRL
CML
36
4
0
13 Jun 2023
Decoupled Prioritized Resampling for Offline RL
Decoupled Prioritized Resampling for Offline RL
Yang Yue
Bingyi Kang
Xiao Ma
Qisen Yang
Gao Huang
S. Song
Shuicheng Yan
OffRL
25
0
0
08 Jun 2023
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden
  Confounding
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Alizée Pace
Hugo Yèche
Bernhard Schölkopf
Gunnar Rätsch
Guy Tennenholtz
OffRL
16
6
0
01 Jun 2023
Balancing policy constraint and ensemble size in uncertainty-based
  offline reinforcement learning
Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning
Alex Beeson
Giovanni Montana
OffRL
24
13
0
26 Mar 2023
A Unified Framework of Policy Learning for Contextual Bandit with
  Confounding Bias and Missing Observations
A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations
Siyu Chen
Yitan Wang
Zhaoran Wang
Zhuoran Yang
OffRL
28
2
0
20 Mar 2023
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online
  Fine-Tuning
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto
Yuexiang Zhai
Anika Singh
Max Sobol Mark
Yi-An Ma
Chelsea Finn
Aviral Kumar
Sergey Levine
OffRL
OnRL
112
108
0
09 Mar 2023
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function
  Approximation
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation
Thanh Nguyen-Tang
R. Arora
OffRL
46
5
0
24 Feb 2023
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning
  for Task-oriented Dialogue Systems
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems
Yihao Feng
Shentao Yang
Shujian Zhang
Jianguo Zhang
Caiming Xiong
Mi Zhou
Haiquan Wang
OffRL
26
24
0
20 Feb 2023
Conservative State Value Estimation for Offline Reinforcement Learning
Conservative State Value Estimation for Offline Reinforcement Learning
Liting Chen
Jie Yan
Zhengdao Shao
Lu Wang
Qingwei Lin
Saravan Rajmohan
Thomas Moscibroda
Dongmei Zhang
OffRL
16
5
0
14 Feb 2023
AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Zhixuan Liang
Yao Mu
Mingyu Ding
Fei Ni
M. Tomizuka
Ping Luo
69
99
0
03 Feb 2023
Selective Uncertainty Propagation in Offline RL
Selective Uncertainty Propagation in Offline RL
Sanath Kumar Krishnamurthy
Shrey Modi
Tanmay Gangwani
S. Katariya
B. Kveton
A. Rangi
OffRL
59
0
0
01 Feb 2023
Model-based Offline Reinforcement Learning with Local Misspecification
Model-based Offline Reinforcement Learning with Local Misspecification
Kefan Dong
Yannis Flet-Berliac
Allen Nie
Emma Brunskill
OffRL
18
4
0
26 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
42
6
0
24 Jan 2023
Risk Sensitive Dead-end Identification in Safety-Critical Offline
  Reinforcement Learning
Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning
Taylor W. Killian
S. Parbhoo
Marzyeh Ghassemi
OffRL
18
6
0
13 Jan 2023
Policy learning "without'' overlap: Pessimism and generalized empirical
  Bernstein's inequality
Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality
Ying Jin
Zhimei Ren
Zhuoran Yang
Zhaoran Wang
OffRL
24
25
0
19 Dec 2022
Offline Robot Reinforcement Learning with Uncertainty-Guided Human
  Expert Sampling
Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling
Ashish Kumar
Ilya Kuzovkin
OffRL
OnRL
32
1
0
16 Dec 2022
Multi-Task Off-Policy Learning from Bandit Feedback
Multi-Task Off-Policy Learning from Bandit Feedback
Joey Hong
B. Kveton
S. Katariya
Manzil Zaheer
Mohammad Ghavamzadeh
OffRL
28
10
0
09 Dec 2022
TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from
  Mixed Datasets
TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets
Yuanying Cai
Chuheng Zhang
Li Zhao
Wei Shen
Xuyun Zhang
Lei Song
Jiang Bian
Tao Qin
Tie-Yan Liu
OffRL
17
3
0
05 Dec 2022
State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement
  Learning
State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning
C. L. P. Chen
Hongyao Tang
Yi-An Ma
Chao Wang
Qianli Shen
Dong Li
Jianye Hao
OffRL
26
0
0
28 Nov 2022
Domain Generalization for Robust Model-Based Offline Reinforcement
  Learning
Domain Generalization for Robust Model-Based Offline Reinforcement Learning
Alan Clark
Shoaib Ahmed Siddiqui
Robert Kirk
Usman Anwar
Stephen Chung
David M. Krueger
OOD
OffRL
25
0
0
27 Nov 2022
Behavior Prior Representation learning for Offline Reinforcement
  Learning
Behavior Prior Representation learning for Offline Reinforcement Learning
Hongyu Zang
Xin Li
Jie Yu
Chen Liu
Riashat Islam
Rémi Tachet des Combes
Romain Laroche
OffRL
OnRL
35
10
0
02 Nov 2022
Agent-Controller Representations: Principled Offline RL with Rich
  Exogenous Information
Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information
Riashat Islam
Manan Tomar
Alex Lamb
Yonathan Efroni
Hongyu Zang
...
Dipendra Kumar Misra
Xin-hui Li
H. V. Seijen
Rémi Tachet des Combes
John Langford
OffRL
22
6
0
31 Oct 2022
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning
  Approach
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach
Yunzhe Zhou
Zhengling Qi
C. Shi
Lexin Li
OffRL
10
8
0
26 Oct 2022
Boosting Offline Reinforcement Learning via Data Rebalancing
Boosting Offline Reinforcement Learning via Data Rebalancing
Yang Yue
Bingyi Kang
Xiao Ma
Zhongwen Xu
Gao Huang
Shuicheng Yan
OffRL
18
22
0
17 Oct 2022
Semi-supervised Batch Learning From Logged Data
Semi-supervised Batch Learning From Logged Data
Gholamali Aminian
Armin Behnamnia
R. Vega
Laura Toni
Chengchun Shi
Hamid R. Rabiee
Omar Rivasplata
Miguel R. D. Rodrigues
OffRL
26
0
0
15 Sep 2022
Online Learning with Off-Policy Feedback
Online Learning with Off-Policy Feedback
Germano Gabbianelli
Matteo Papini
Gergely Neu
OffRL
18
4
0
18 Jul 2022
Offline Policy Optimization with Eligible Actions
Offline Policy Optimization with Eligible Actions
Yao Liu
Yannis Flet-Berliac
Emma Brunskill
OffRL
17
5
0
01 Jul 2022
When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online
  Reinforcement Learning
When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
Haoyi Niu
Shubham Sharma
Yiwen Qiu
Ming Li
Guyue Zhou
Jianming Hu
Xianyuan Zhan
OffRL
OnRL
27
46
0
27 Jun 2022
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
  Reward
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward
Tengyu Xu
Yue Wang
Shaofeng Zou
Yingbin Liang
OffRL
28
12
0
13 Jun 2022
Model-based Offline Imitation Learning with Non-expert Data
Model-based Offline Imitation Learning with Non-expert Data
Jeongwon Park
Lin F. Yang
OffRL
32
1
0
11 Jun 2022
Incorporating Explicit Uncertainty Estimates into Deep Offline
  Reinforcement Learning
Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning
David Brandfonbrener
Rémi Tachet des Combes
Romain Laroche
OffRL
29
5
0
02 Jun 2022
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in
  Offline RL
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL
Wonjoon Goo
S. Niekum
OffRL
19
20
0
01 Jun 2022
12
Next