Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2106.06926
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Bellman-consistent Pessimism for Offline Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2021
13 June 2021
Tengyang Xie
Ching-An Cheng
Nan Jiang
Paul Mineiro
Alekh Agarwal
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bellman-consistent Pessimism for Offline Reinforcement Learning"
50 / 224 papers shown
Title
Combining Experimental and Historical Data for Policy Evaluation
Ting Li
Chengchun Shi
Qianglin Wen
Yang Sui
Yongli Qin
Chunbo Lai
Hongtu Zhu
OffRL
232
3
0
01 Jun 2024
Transfer Q Star: Principled Decoding for LLM Alignment
Souradip Chakraborty
Soumya Suvra Ghosal
Ming Yin
Dinesh Manocha
Mengdi Wang
Amrit Singh Bedi
Furong Huang
193
41
0
30 May 2024
Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models
Masatoshi Uehara
Yulai Zhao
Ehsan Hajiramezanali
Gabriele Scalia
Gökçen Eraslan
Avantika Lal
Sergey Levine
Tommaso Biancalani
303
24
0
30 May 2024
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
266
54
0
29 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
OnRL
208
5
0
28 May 2024
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear
q
π
q^π
q
π
-Realizability and Concentrability
Volodymyr Tkachuk
Gellert Weisz
Csaba Szepesvári
OffRL
101
3
0
27 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
243
78
0
26 May 2024
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yang Zhang
Shixin Yang
Chenjia Bai
Fei Wu
Xiu Li
Zhen Wang
Xuelong Li
LLMAG
347
46
0
23 May 2024
Offline RL via Feature-Occupancy Gradient Ascent
Gergely Neu
Nneka Okolo
OffRL
144
1
0
22 May 2024
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
OffRL
205
2
0
20 May 2024
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses
International Conferences on Pattern Recognition and Artificial Intelligence (ICCPRAI), 2024
Thanh Nguyen
Tung M. Luu
Tri Ton
Chang D. Yoo
OffRL
AAML
167
3
0
18 May 2024
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning
Chenjia Bai
Lingxiao Wang
Jianye Hao
Zhuoran Yang
Bin Zhao
Zhen Wang
Xuelong Li
OffRL
168
10
0
30 Apr 2024
Optimal Design for Human Feedback
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
Branislav Kveton
304
11
0
22 Apr 2024
Offline Trajectory Optimization for Offline Reinforcement Learning
Ziqi Zhao
Zhaochun Ren
Liu Yang
Fajie Yuan
Sudipta Singha Roy
Zhumin Chen
Jun Ma
Jun Ma
Xin Xin
OffRL
190
1
0
16 Apr 2024
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning
Xudong Yu
Chenjia Bai
Hongyi Guo
Changhong Wang
Zhen Wang
OffRL
197
0
0
09 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
298
146
0
04 Apr 2024
Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
T. V. Marinov
Alekh Agarwal
Mircea Trofin
OffRL
103
1
0
28 Mar 2024
Diffusion Model for Data-Driven Black-Box Optimization
Zihao Li
Hui Yuan
Kaixuan Huang
Chengzhuo Ni
Yinyu Ye
Minshuo Chen
Mengdi Wang
DiffM
155
19
0
20 Mar 2024
Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes
He Wang
Laixi Shi
Yuejie Chi
OffRL
180
13
0
19 Mar 2024
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
Kevin Tan
Ziping Xu
OffRL
OnRL
198
5
0
07 Mar 2024
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw
Shivam Singhal
Anca Dragan
AAML
234
23
0
05 Mar 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
130
2
0
04 Mar 2024
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks
Ziping Xu
Zifan Xu
Runxuan Jiang
Peter Stone
Ambuj Tewari
248
2
0
03 Mar 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
257
112
0
29 Feb 2024
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Yu Chen
Xiangcheng Zhang
Siwei Wang
Longbo Huang
194
3
0
28 Feb 2024
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Ruiqi Zhang
Yuexiang Zhai
Andrea Zanette
205
0
0
24 Feb 2024
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
Yuheng Zhang
Nan Jiang
OffRL
125
4
0
22 Feb 2024
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
Ruijie Zheng
Ching-An Cheng
Hal Daumé
Furong Huang
Andrey Kolobov
164
14
0
16 Feb 2024
Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
Chen Ye
Jiafan He
Quanquan Gu
Tong Zhang
183
10
0
14 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Neural Information Processing Systems (NeurIPS), 2024
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
158
27
0
11 Feb 2024
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
International Conference on Machine Learning (ICML), 2024
Kaiwen Wang
Owen Oertell
Alekh Agarwal
Nathan Kallus
Wen Sun
OffRL
215
17
0
11 Feb 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
149
6
0
08 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
165
41
0
29 Jan 2024
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Mao Hong
Zhiyue Zhang
Yue Wu
Yan Xu
OffRL
212
0
0
21 Jan 2024
Exploration and Anti-Exploration with Distributional Random Network Distillation
Kai Yang
Jian Tao
Jiafei Lyu
Xiu Li
310
25
0
18 Jan 2024
Learning from Sparse Offline Datasets via Conservative Density Estimation
International Conference on Learning Representations (ICLR), 2024
Zhepeng Cen
Zuxin Liu
Zitong Wang
Yi-Fan Yao
Henry Lam
Ding Zhao
OffRL
143
9
0
16 Jan 2024
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
J. Kuba
Masatoshi Uehara
Pieter Abbeel
Sergey Levine
AI4CE
208
9
0
08 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
International Conference on Machine Learning (ICML), 2024
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
316
125
0
08 Jan 2024
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
193
5
0
06 Jan 2024
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
Neural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Han Zhong
OffRL
178
3
0
28 Dec 2023
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu
OffRL
190
1
0
19 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
175
275
0
18 Dec 2023
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
Xiao-Yin Liu
Xiao-Hu Zhou
Guo-Tao Li
Hao Li
Mei-Jiang Gui
Tian-Yu Xiang
De-Xing Huang
Zeng-Guang Hou
OffRL
171
8
0
07 Dec 2023
When is Offline Policy Selection Sample Efficient for Reinforcement Learning?
Vincent Liu
P. Nagarajan
Andrew Patterson
Martha White
OffRL
166
3
0
04 Dec 2023
AI in Pharma for Personalized Sequential Decision-Making: Methods, Applications and Opportunities
Yuhan Li
Hongtao Zhang
Keaven M Anderson
Songzi Li
Ruoqing Zhu
121
0
0
30 Nov 2023
Supported Trust Region Optimization for Offline Reinforcement Learning
International Conference on Machine Learning (ICML), 2023
Yongyi Mao
Hongchang Zhang
Chong Chen
Yi Tian Xu
Xiangyang Ji
OffRL
185
20
0
15 Nov 2023
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
International Conference on Learning Representations (ICLR), 2023
Yifei Zhou
Ayush Sekhari
Yuda Song
Wen Sun
OffRL
OnRL
128
8
0
14 Nov 2023
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning
International Conference on Learning Representations (ICLR), 2023
Zhaoyi Zhou
Chuning Zhu
Runlong Zhou
Qiwen Cui
Abhishek Gupta
S. S. Du
OffRL
146
9
0
30 Oct 2023
Robust Offline Reinforcement learning with Heavy-Tailed Rewards
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Jin Zhu
Runzhe Wan
Zhengling Qi
Shuang Luo
C. Shi
OffRL
188
2
0
28 Oct 2023
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Neural Information Processing Systems (NeurIPS), 2023
Chen Ye
Rui Yang
Quanquan Gu
Tong Zhang
OffRL
236
26
0
23 Oct 2023
Previous
1
2
3
4
5
Next