ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.11566
  4. Cited By
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement
  Learning

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

International Conference on Learning Representations (ICLR), 2022
23 February 2022
Chenjia Bai
Lingxiao Wang
Zhuoran Yang
Zhihong Deng
Animesh Garg
Peng Liu
Zhaoran Wang
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning"

50 / 101 papers shown
Title
Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
Lipeng Zu
Hansong Zhou
Xiaonan Zhang
OffRLOnRL
229
0
0
05 Nov 2025
Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning
Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning
Yixiu Mao
Yun Qu
Qi Wang
Xiangyang Ji
OffRL
137
0
0
04 Nov 2025
Online Optimization for Offline Safe Reinforcement Learning
Online Optimization for Offline Safe Reinforcement Learning
Yassine Chemingui
Aryan Deshwal
Alan Fern
Thanh Nguyen-Tang
J. Doppa
OffRL
108
0
0
24 Oct 2025
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Ziyi Chen
Junyi Li
Qi He
Heng-Chiao Huang
144
0
0
07 Oct 2025
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
148
10
0
05 Oct 2025
Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data
Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data
H. Just
Myeongseob Ko
Ruoxi Jia
LRM
148
1
0
05 Oct 2025
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Gaurav Chaudhary
Wassim Uddin Mondal
Laxmidhar Behera
OffRL
335
2
0
11 Jun 2025
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Qin-Wen Luo
Ming-Kun Xie
Ye-Wen Wang
Sheng-Jun Huang
OffRL
168
0
0
26 May 2025
Decision Flow Policy Optimization
Decision Flow Policy Optimization
Jifeng Hu
Sili Huang
Siyuan Guo
Zhaogeng Liu
Li Shen
Lichao Sun
Hechang Chen
Yi-Ju Chang
Dacheng Tao
289
0
0
26 May 2025
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Xuyang Chen
Keyu Yan
Wenhan Cao
Tianyuan Chen
OffRL
414
2
0
08 May 2025
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Jifeng Hu
Sili Huang
Zhiyong Yang
Shengchao Hu
Li Shen
Hechang Chen
Lichao Sun
Yi-Ju Chang
Dacheng Tao
OffRL
881
0
0
03 May 2025
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Junwon Seo
Kensuke Nakamura
Andrea V. Bajcsy
374
8
0
01 May 2025
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2025
Haoran Xu
Shuozhe Li
Harshit S. Sikchi
S. Niekum
Amy Zhang
OffRL
323
1
0
17 Apr 2025
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning
Xuyang Chen
Guojian Wang
Keyu Yan
Tianyuan Chen
OffRL
379
1
0
16 Apr 2025
Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning
Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning
Ke Jiang
Wen Jiang
You Li
Xiaoyang Tan
OffRL
280
0
0
02 Apr 2025
Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation
Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation
Hongye Cao
Fan Feng
Jing Huo
Shangdong Yang
Meng Fang
Zhenxing Ge
Yang Gao
AAMLOffRL
205
0
0
26 Mar 2025
Policy Constraint by Only Support Constraint for Offline Reinforcement Learning
Yunkai Gao
Jiaming Guo
Fan Wu
Rui Zhang
OffRL
207
1
0
07 Mar 2025
Data Center Cooling System Optimization Using Offline Reinforcement Learning
Data Center Cooling System Optimization Using Offline Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2025
Xianyuan Zhan
Xiangyu Zhu
Peng Cheng
Xiao Hu
Ziteng He
...
Chenhui Liu
Tianshun Hong
Huiwen Zheng
Yunxin Liu
Feng Zhao
AI4CE
379
3
0
17 Feb 2025
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Abdullah Akgul
Manuel Haußmann
M. Kandemir
OffRL
521
0
0
17 Jan 2025
An Investigation of Offline Reinforcement Learning in Factorisable Action Spaces
Alex Beeson
David Ireland
Giovanni Montana
OffRL
334
2
0
17 Nov 2024
Uncertainty-based Offline Variational Bayesian Reinforcement Learning
  for Robustness under Diverse Data Corruptions
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data CorruptionsNeural Information Processing Systems (NeurIPS), 2024
Rui Yang
Jie Wang
Guoping Wu
Yangqiu Song
AAMLOffRL
355
8
0
01 Nov 2024
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency modelNeural Information Processing Systems (NeurIPS), 2024
Jing Zhang
Linjiajie Fang
Kexin Shi
Wenjia Wang
Bing-Yi Jing
OffRL
314
1
0
27 Oct 2024
Offline Reinforcement Learning with OOD State Correction and OOD Action
  Suppression
Offline Reinforcement Learning with OOD State Correction and OOD Action SuppressionNeural Information Processing Systems (NeurIPS), 2024
Yixiu Mao
Qi Wang
Chen Chen
Yun Qu
Xiangyang Ji
OffRL
522
13
0
25 Oct 2024
Grounded Answers for Multi-agent Decision-making Problem through
  Generative World Model
Grounded Answers for Multi-agent Decision-making Problem through Generative World ModelNeural Information Processing Systems (NeurIPS), 2024
Zeyang Liu
Xinrui Yang
Shiguang Sun
Long Qian
Lipeng Wan
Xingyu Chen
Xuguang Lan
305
4
0
03 Oct 2024
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with
  Stationary Distribution Shift Regularization
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift RegularizationInternational Conference on Learning Representations (ICLR), 2024
The Viet Bui
Thanh Hong Nguyen
Tien Mai
OffRL
304
4
0
02 Oct 2024
Mitigating Distribution Shift in Model-based Offline RL via Shifts-aware Reward Learning
Mitigating Distribution Shift in Model-based Offline RL via Shifts-aware Reward Learning
Wang Luo
Haoran Li
Zicheng Zhang
Congying Han
Chi Zhou
Jiayu Lv
Tiande Guo
OffRL
430
1
0
23 Aug 2024
SelfBC: Self Behavior Cloning for Offline Reinforcement Learning
SelfBC: Self Behavior Cloning for Offline Reinforcement LearningEuropean Conference on Artificial Intelligence (ECAI), 2024
Shirong Liu
Chenjia Bai
Zixian Guo
Hao Zhang
Gaurav Sharma
Yang Liu
OffRL
240
3
0
04 Aug 2024
Reinforcement Learning for Sustainable Energy: A Survey
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRLAI4CE
187
7
0
26 Jul 2024
CDSA: Conservative Denoising Score-based Algorithm for Offline
  Reinforcement Learning
CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning
Zeyuan Liu
Kai Yang
Xiu Li
OffRL
295
0
0
11 Jun 2024
UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function
  in Offline Reinforcement Learning
UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning
Yu Zhang
Rui Yu
Zhipeng Yao
Wenyuan Zhang
Jun Wang
Liming Zhang
OffRL
261
0
0
05 Jun 2024
Combining Experimental and Historical Data for Policy Evaluation
Combining Experimental and Historical Data for Policy Evaluation
Ting Li
Chengchun Shi
Qianglin Wen
Yang Sui
Yongli Qin
Chunbo Lai
Hongtu Zhu
OffRL
320
3
0
01 Jun 2024
Constrained Ensemble Exploration for Unsupervised Skill Discovery
Constrained Ensemble Exploration for Unsupervised Skill Discovery
Chenjia Bai
Rushuai Yang
Qiaosheng Zhang
Kang Xu
Yi Chen
Ting Xiao
Xuelong Li
OffRL
399
7
0
25 May 2024
Exclusively Penalized Q-learning for Offline Reinforcement Learning
Exclusively Penalized Q-learning for Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Junghyuk Yeom
Yonghyeon Jo
Jungmo Kim
Sanghyeon Lee
Seungyul Han
OffRL
261
3
0
23 May 2024
Towards Robust Policy: Enhancing Offline Reinforcement Learning with
  Adversarial Attacks and Defenses
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and DefensesInternational Conferences on Pattern Recognition and Artificial Intelligence (ICCPRAI), 2024
Thanh Nguyen
Tung M. Luu
Tri Ton
Chang D. Yoo
OffRLAAML
240
3
0
18 May 2024
Reinformer: Max-Return Sequence Modeling for Offline RL
Reinformer: Max-Return Sequence Modeling for Offline RLInternational Conference on Machine Learning (ICML), 2024
Zifeng Zhuang
Dengyun Peng
Jinxin Liu
Ziqi Zhang
Xuetao Zhang
OffRLAI4TS
289
21
0
14 May 2024
Ensemble Successor Representations for Task Generalization in
  Offline-to-Online Reinforcement Learning
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning
Changhong Wang
Xudong Yu
Chenjia Bai
Qiaosheng Zhang
Zhen Wang
218
2
0
12 May 2024
Contrastive Representation for Data Filtering in Cross-Domain Offline
  Reinforcement Learning
Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement LearningInternational Conference on Machine Learning (ICML), 2024
Xiaoyu Wen
Chenjia Bai
Kang Xu
Xudong Yu
Yang Zhang
Xuelong Li
Zhen Wang
292
7
0
10 May 2024
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline
  Reinforcement Learning
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning
Chenjia Bai
Lingxiao Wang
Jianye Hao
Zhuoran Yang
Bin Zhao
Zhen Wang
Xuelong Li
OffRL
220
10
0
30 Apr 2024
Diverse Randomized Value Functions: A Provably Pessimistic Approach for
  Offline Reinforcement Learning
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning
Xudong Yu
Chenjia Bai
Hongyi Guo
Changhong Wang
Zhen Wang
OffRL
273
0
0
09 Apr 2024
Compositional Conservatism: A Transductive Approach in Offline
  Reinforcement Learning
Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning
Yeda Song
Dongwook Lee
Gunhee Kim
OffRL
155
1
0
06 Apr 2024
Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning
Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning
Yi Shen
Hanyan Huang
Shan Xie
190
0
0
03 Apr 2024
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning
Ruoqing Zhang
Ziwei Luo
Jens Sjölund
Thomas B. Schön
Per Mattsson
258
20
0
06 Feb 2024
SEABO: A Simple Search-Based Method for Offline Imitation Learning
SEABO: A Simple Search-Based Method for Offline Imitation LearningInternational Conference on Learning Representations (ICLR), 2024
Jiafei Lyu
Xiaoteng Ma
Le Wan
Runze Liu
Xiu Li
Zongqing Lu
OffRL
278
14
0
06 Feb 2024
ODICE: Revealing the Mystery of Distribution Correction Estimation via
  Orthogonal-gradient Update
ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update
Liyuan Mao
Haoran Xu
Weinan Zhang
Xianyuan Zhan
308
20
0
01 Feb 2024
Off-Policy Primal-Dual Safe Reinforcement Learning
Off-Policy Primal-Dual Safe Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2024
Zifan Wu
Bo Tang
Qian Lin
Chao Yu
Shangqin Mao
Qianlong Xie
Xingxing Wang
Dong Wang
OffRL
254
7
0
26 Jan 2024
A unified uncertainty-aware exploration: Combining epistemic and
  aleatory uncertainty
A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty
Parvin Malekzadeh
Ming Hou
Konstantinos N. Plataniotis
UD
168
5
0
05 Jan 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
  Diverse Reward LoRA Ensembles
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
309
40
0
30 Dec 2023
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement
  Learning?
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?
Gunshi Gupta
Tim G. J. Rudner
R. McAllister
Adrien Gaidon
Y. Gal
OffRL
183
4
0
28 Dec 2023
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement
  Learning
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2023
Yinmin Zhang
Jie Liu
Chuming Li
Yazhe Niu
Yaodong Yang
Yu Liu
Wanli Ouyang
OffRLOnRL
295
24
0
12 Dec 2023
Model-Based Epistemic Variance of Values for Risk-Aware Policy
  Optimization
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
Carlos E. Luis
A. Bottero
Julia Vinogradska
Felix Berkenkamp
Jan Peters
OffRL
315
3
0
07 Dec 2023
123
Next