Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.08812
Cited By
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
19 August 2021
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning"
50 / 90 papers shown
Title
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
71
0
0
10 Mar 2025
On The Statistical Complexity of Offline Decision-Making
Thanh Nguyen-Tang
R. Arora
OffRL
33
1
0
10 Jan 2025
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation
Momin Haider
Ming Yin
Menglei Zhang
Arpit Gupta
Jing Zhu
Yu-Xiang Wang
OffRL
26
1
0
30 Oct 2024
Grounded Answers for Multi-agent Decision-making Problem through Generative World Model
Zeyang Liu
Xinrui Yang
Shiguang Sun
Long Qian
Lipeng Wan
Xingyu Chen
Xuguang Lan
22
2
0
03 Oct 2024
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu
Boyuan Wang
Yuhang Jiang
Jianzhun Shao
Yixiu Mao
Cheems Wang
Chang Liu
Xiangyang Ji
46
4
0
03 Oct 2024
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks
Yun Qu
Boyuan Wang
Jianzhun Shao
Yuhang Jiang
Chen Chen
...
Qiang Fu
Wei Yang
Guang Yang
Lanxiao Huang
Xiangyang Ji
OffRL
39
9
0
20 Aug 2024
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Kevin Tan
Wei Fan
Yuting Wei
OffRL
66
2
0
08 Aug 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Shuang Qiu
Mladen Kolar
Tong Zhang
OffRL
30
0
0
10 Jul 2024
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Shenao Zhang
Donghan Yu
Hiteshi Sharma
Ziyi Yang
Shuohang Wang
Hany Hassan
Zhaoran Wang
LRM
40
28
0
29 May 2024
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
73
21
0
29 May 2024
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park
Mingyang Liu
Dingwen Kong
Kaiqing Zhang
Asuman Ozdaglar
28
28
0
30 Apr 2024
Optimal Design for Human Feedback
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
B. Kveton
31
0
0
22 Apr 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
21
1
0
04 Mar 2024
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Ruiqi Zhang
Yuexiang Zhai
Andrea Zanette
41
0
0
24 Feb 2024
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
Yuheng Zhang
Nan Jiang
OffRL
27
4
0
22 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
38
9
0
11 Feb 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
24
3
0
08 Feb 2024
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Mao Hong
Zhiyue Zhang
Yue Wu
Yan Xu
OffRL
39
0
0
21 Jan 2024
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
25
3
0
06 Jan 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
36
155
0
18 Dec 2023
The Generalization Gap in Offline Reinforcement Learning
Ishita Mediratta
Qingfei You
Minqi Jiang
Roberta Raileanu
OffRL
73
10
0
10 Dec 2023
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Chen Ye
Rui Yang
Quanquan Gu
Tong Zhang
OffRL
25
16
0
23 Oct 2023
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
Rui Yang
Han Zhong
Jiawei Xu
Amy Zhang
Chong Zhang
Lei Han
Tong Zhang
OffRL
OnRL
41
15
0
19 Oct 2023
Bi-Level Offline Policy Optimization with Limited Exploration
Wenzhuo Zhou
OffRL
34
4
0
10 Oct 2023
Sample-Efficient Multi-Agent RL: An Optimization Perspective
Nuoya Xiong
Zhihan Liu
Zhaoran Wang
Zhuoran Yang
36
1
0
10 Oct 2023
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
Qiwei Di
Heyang Zhao
Jiafan He
Quanquan Gu
OffRL
42
5
0
02 Oct 2023
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
19
0
0
28 Sep 2023
Zero-Shot Reinforcement Learning from Low Quality Data
Scott Jeen
Tom Bewley
Jonathan M. Cullen
OffRL
OnRL
32
0
0
26 Sep 2023
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Jianzhun Shao
Yun Qu
Chen Chen
Hongchang Zhang
Xiangyang Ji
OffRL
8
19
0
22 Sep 2023
Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War
Zeyang Jia
Eli Ben-Michael
Kosuke Imai
19
4
0
17 Jul 2023
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Ruiqi Zhang
Andrea Zanette
OffRL
OnRL
30
5
0
10 Jul 2023
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
Chengshuai Shi
Wei Xiong
Cong Shen
Jing Yang
OffRL
25
3
0
14 Jun 2023
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Zeyu Zhang
Yi-Hsun Su
Hui Yuan
Yiran Wu
R. Balasubramanian
Qingyun Wu
Huazheng Wang
Mengdi Wang
OffRL
CML
23
4
0
13 Jun 2023
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Kaiwen Wang
Kevin Zhou
Runzhe Wu
Nathan Kallus
Wen Sun
OffRL
21
17
0
25 May 2023
Offline Primal-Dual Reinforcement Learning for Linear MDPs
Germano Gabbianelli
Gergely Neu
Nneka Okolo
Matteo Papini
OffRL
21
7
0
22 May 2023
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Jose H. Blanchet
Miao Lu
Tong Zhang
Han Zhong
OffRL
37
29
0
16 May 2023
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto
Yuexiang Zhai
Anika Singh
Max Sobol Mark
Yi-An Ma
Chelsea Finn
Aviral Kumar
Sergey Levine
OffRL
OnRL
109
108
0
09 Mar 2023
Adversarial Model for Offline Reinforcement Learning
M. Bhardwaj
Tengyang Xie
Byron Boots
Nan Jiang
Ching-An Cheng
AAML
OffRL
27
25
0
21 Feb 2023
Offline Learning in Markov Games with General Function Approximation
Yuheng Zhang
Yunru Bai
Nan Jiang
OffRL
8
8
0
06 Feb 2023
Learning in POMDPs is Sample-Efficient with Hindsight Observability
Jonathan Lee
Alekh Agarwal
Christoph Dann
Tong Zhang
18
19
0
31 Jan 2023
STEEL: Singularity-aware Reinforcement Learning
Xiaohong Chen
Zhengling Qi
Runzhe Wan
OffRL
14
2
0
30 Jan 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Hanlin Zhu
Paria Rashidinejad
Jiantao Jiao
OffRL
25
15
0
30 Jan 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
20
174
0
26 Jan 2023
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency
Wenlong Mou
Peng Ding
Martin J. Wainwright
Peter L. Bartlett
OffRL
6
10
0
16 Jan 2023
Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality
Ying Jin
Zhimei Ren
Zhuoran Yang
Zhaoran Wang
OffRL
19
25
0
19 Dec 2022
State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning
C. L. P. Chen
Hongyao Tang
Yi-An Ma
Chao Wang
Qianli Shen
Dong Li
Jianye Hao
OffRL
13
0
0
28 Nov 2022
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Thanh Nguyen-Tang
Ming Yin
Sunil R. Gupta
Svetha Venkatesh
R. Arora
OffRL
45
15
0
23 Nov 2022
Leveraging Offline Data in Online Reinforcement Learning
Andrew Wagenmaker
Aldo Pacchiano
OffRL
OnRL
22
36
0
09 Nov 2022
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
Takumi Tanabe
Reimi Sato
Kazuto Fukuchi
Jun Sakuma
Youhei Akimoto
OffRL
11
8
0
07 Nov 2022
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Paria Rashidinejad
Hanlin Zhu
Kunhe Yang
Stuart J. Russell
Jiantao Jiao
OffRL
33
26
0
01 Nov 2022
1
2
Next