Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.06926
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Bellman-consistent Pessimism for Offline Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2021
13 June 2021
Tengyang Xie
Ching-An Cheng
Nan Jiang
Paul Mineiro
Alekh Agarwal
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bellman-consistent Pessimism for Offline Reinforcement Learning"
50 / 224 papers shown
Title
Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning
Yixiu Mao
Yun Qu
Qi Wang
Xiangyang Ji
OffRL
24
0
0
04 Nov 2025
Offline Clustering of Preference Learning with Active-data Augmentation
Jingyuan Liu
Fatemeh Ghaffari
Xuchuang Wang
Xutong Liu
Mohammad Hajiesmaili
Carlee Joe-Wong
OffRL
38
0
0
30 Oct 2025
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
12
0
0
28 Oct 2025
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Jongmin Lee
Ernest K. Ryu
OffRL
20
0
0
20 Oct 2025
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Yuchun Miao
Liang Ding
Sen Zhang
Rong Bao
L. Zhang
Dacheng Tao
48
0
0
15 Oct 2025
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Ziyi Chen
Junyi Li
Qi He
Heng-Chiao Huang
48
0
0
07 Oct 2025
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
76
8
0
05 Oct 2025
Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback
Shreyas Chaudhari
Renhao Zhang
Philip S. Thomas
Bruno Castro da Silva
OffRL
56
0
0
30 Sep 2025
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Qiushui Xu
Yuhao Huang
Yushu Jiang
Lei Song
Jinyu Wang
Wenliang Zheng
Jiang Bian
OffRL
32
0
0
28 Sep 2025
A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory
Fengdi Che
OffRL
52
0
0
11 Aug 2025
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Ruiquan Huang
Donghao Li
Chengshuai Shi
Cong Shen
Jing Yang
OffRL
238
0
0
01 Jul 2025
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Andrew Wagenmaker
Mitsuhiko Nakamoto
Yunchu Zhang
S. Park
Waleed Yagoub
Anusha Nagabandi
Abhishek Gupta
Sergey Levine
OffRL
177
16
0
18 Jun 2025
Generalized Linear Markov Decision Process
Sinian Zhang
Kaicheng Zhang
Ziping Xu
Tianxi Cai
D. Zhou
128
0
0
01 Jun 2025
Square
χ
χ
χ
PO: Differentially Private and Robust
χ
2
χ^2
χ
2
-Preference Optimization in Offline Direct Alignment
Xingyu Zhou
Yulian Wu
Wenqian Weng
Francesco Orabona
225
0
0
27 May 2025
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
143
1
0
26 May 2025
Offline Constrained Reinforcement Learning under Partial Data Coverage
Kihyuk Hong
Ambuj Tewari
OffRL
244
0
0
23 May 2025
Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Minh Hoang Nguyen
Linh Le Pham Van
Thommen George Karimpanal
Sunil Gupta
Hung Le
OffRL
LRM
144
1
0
14 May 2025
Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data
Annals of Statistics (AoS), 2025
Rui Miao
Babak Shahbaba
Annie Qu
OffRL
211
1
0
14 May 2025
SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation
IEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2025
Jingkai Xu
Xiangli Nie
135
2
0
22 Apr 2025
Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning
Ke Jiang
Wen Jiang
You Li
Xiaoyang Tan
OffRL
208
0
0
02 Apr 2025
Towards Optimal Offline Reinforcement Learning
Mengmeng Li
Daniel Kuhn
Tobias Sutter
OffRL
217
0
0
15 Mar 2025
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
205
2
0
10 Mar 2025
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
International Conference on Learning Representations (ICLR), 2025
Yuheng Zhang
Nan Jiang
OffRL
139
1
0
03 Mar 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
445
4
0
26 Feb 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
International Conference on Learning Representations (ICLR), 2024
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
386
54
0
20 Feb 2025
Learning an Optimal Assortment Policy under Observational Data
Yuxuan Han
Han Zhong
Miao Lu
Jose H. Blanchet
Zhengyuan Zhou
OffRL
368
1
0
10 Feb 2025
Towards a Sharp Analysis of Offline Policy Learning for
f
f
f
-Divergence-Regularized Contextual Bandits
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
203
1
0
09 Feb 2025
Design Considerations in Offline Preference-based RL
Alekh Agarwal
Christoph Dann
T. V. Marinov
OffRL
203
1
0
08 Feb 2025
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2024
Abdullah Akgul
Manuel Haußmann
M. Kandemir
OffRL
382
0
0
17 Jan 2025
On The Statistical Complexity of Offline Decision-Making
International Conference on Machine Learning (ICML), 2025
Thanh Nguyen-Tang
R. Arora
OffRL
290
2
0
10 Jan 2025
Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning
Knowledge Discovery and Data Mining (KDD), 2025
Tao Liu
Qi Xu
Wei Shi
Zhigang Hua
Shuang Yang
OffRL
149
1
0
09 Jan 2025
LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency
Xiao-Yin Liu
Guotao Li
Xiao-Hu Zhou
Z. Hou
OffRL
177
0
0
30 Dec 2024
Dynamic Non-Prehensile Object Transport via Model-Predictive Reinforcement Learning
IEEE International Conference on Robotics and Automation (ICRA), 2024
Neel Jawale
Byron Boots
Balakumar Sundaralingam
M. Bhardwaj
197
2
0
27 Nov 2024
Preserving Expert-Level Privacy in Offline Reinforcement Learning
Navodita Sharma
Vishnu Vinod
Abhradeep Thakurta
Alekh Agarwal
Borja Balle
Christoph Dann
A. Raghuveer
OffRL
178
0
0
18 Nov 2024
Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression
Neural Information Processing Systems (NeurIPS), 2024
Yixiu Mao
Qi Wang
Chen Chen
Yun Qu
Xiangyang Ji
OffRL
365
12
0
25 Oct 2024
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Shenao Zhang
Zhihan Liu
Boyi Liu
Yanzhe Zhang
Yingxiang Yang
Yunxing Liu
Liyu Chen
Tao Sun
Ziyi Wang
355
5
0
10 Oct 2024
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu
Boyuan Wang
Yuhang Jiang
Jianzhun Shao
Yixiu Mao
Cheems Wang
Chang Liu
Xiangyang Ji
218
8
0
03 Oct 2024
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Wenhao Zhan
Scott Fujimoto
Zheqing Zhu
Jason D. Lee
Daniel Jiang
Yonathan Efroni
OffRL
234
1
0
01 Oct 2024
Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning
Zhishuai Liu
Weixin Wang
Pan Xu
198
12
0
30 Sep 2024
The Central Role of the Loss Function in Reinforcement Learning
Kaiwen Wang
Nathan Kallus
Wen Sun
OffRL
456
10
0
19 Sep 2024
Optimization Solution Functions as Deterministic Policies for Offline Reinforcement Learning
American Control Conference (ACC), 2024
Vanshaj Khattar
Ming Jin
OffRL
136
0
0
27 Aug 2024
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks
Neural Information Processing Systems (NeurIPS), 2024
Yun Qu
Boyuan Wang
Jianzhun Shao
Yuhang Jiang
Chen Chen
...
Qiang Fu
Wei Yang
Guang Yang
Lanxiao Huang
Xiangyang Ji
OffRL
143
13
0
20 Aug 2024
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
Neural Information Processing Systems (NeurIPS), 2024
Ying Fan
Jingling Li
Adith Swaminathan
Aditya Modi
Ching-An Cheng
OffRL
216
0
0
14 Aug 2024
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Neural Information Processing Systems (NeurIPS), 2024
Kevin Tan
Wei Fan
Yuting Wei
OffRL
198
4
0
08 Aug 2024
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review
Masatoshi Uehara
Yulai Zhao
Tommaso Biancalani
Sergey Levine
216
49
0
18 Jul 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Delin Qu
Mladen Kolar
Tong Zhang
OffRL
162
1
0
10 Jul 2024
The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation
Noah Golowich
Ankur Moitra
OffRL
217
3
0
17 Jun 2024
Structured Difference-of-Q via Orthogonal Learning
Defu Cao
Angela Zhou
243
0
0
12 Jun 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
199
4
0
11 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
200
9
0
06 Jun 2024
1
2
3
4
5
Next