Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.15085
Cited By
v1
v2
v3 (latest)
Is Pessimism Provably Efficient for Offline RL?
International Conference on Machine Learning (ICML), 2020
30 December 2020
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Is Pessimism Provably Efficient for Offline RL?"
50 / 290 papers shown
Title
Preference Elicitation for Offline Reinforcement Learning
Alizée Pace
Bernhard Schölkopf
Gunnar Rätsch
Giorgia Ramponi
OffRL
223
1
0
26 Jun 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Neural Information Processing Systems (NeurIPS), 2024
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
255
96
0
14 Jun 2024
Structured Difference-of-Q via Orthogonal Learning
Defu Cao
Angela Zhou
334
0
0
12 Jun 2024
Strategically Conservative Q-Learning
Yutaka Shimizu
Joey Hong
Sergey Levine
Masayoshi Tomizuka
OffRL
OnRL
222
1
0
06 Jun 2024
Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling
Imad Aouali
Victor-Emmanuel Brunel
David Rohde
Anna Korba
OffRL
216
4
0
05 Jun 2024
Combining Experimental and Historical Data for Policy Evaluation
Ting Li
Chengchun Shi
Qianglin Wen
Yang Sui
Yongli Qin
Chunbo Lai
Hongtu Zhu
OffRL
320
3
0
01 Jun 2024
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Haotian Hu
Yiqin Yang
Jianing Ye
Chengjie Wu
Ziqing Mai
Yujing Hu
Tangjie Lv
Changjie Fan
Qianchuan Zhao
Chongjie Zhang
OffRL
OnRL
203
7
0
31 May 2024
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
International Conference on Machine Learning (ICML), 2024
Fengdi Che
Chenjun Xiao
Jincheng Mei
Bo Dai
Ramki Gummadi
Oscar A Ramirez
Christopher K Harris
A. R. Mahmood
Dale Schuurmans
340
6
0
31 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
327
84
0
26 May 2024
Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning
Neural Information Processing Systems (NeurIPS), 2024
Otmane Sakhi
Imad Aouali
Pierre Alquier
Nicolas Chopin
OffRL
248
11
0
23 May 2024
State-Constrained Offline Reinforcement Learning
Charles A. Hepburn
Yue Jin
Giovanni Montana
OffRL
319
0
0
23 May 2024
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
OffRL
277
2
0
20 May 2024
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses
International Conferences on Pattern Recognition and Artificial Intelligence (ICCPRAI), 2024
Thanh Nguyen
Tung M. Luu
Tri Ton
Chang D. Yoo
OffRL
AAML
236
3
0
18 May 2024
Learning Decision Policies with Instrumental Variables through Double Machine Learning
International Conference on Machine Learning (ICML), 2024
Daqian Shao
Ashkan Soleymani
Francesco Quinzan
Marta Z. Kwiatkowska
416
2
0
14 May 2024
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning
Changhong Wang
Xudong Yu
Chenjia Bai
Qiaosheng Zhang
Zhen Wang
214
2
0
12 May 2024
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning
Chenjia Bai
Lingxiao Wang
Jianye Hao
Zhuoran Yang
Bin Zhao
Zhen Wang
Xuelong Li
OffRL
216
10
0
30 Apr 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
585
96
0
29 Apr 2024
Optimal Design for Human Feedback
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
Branislav Kveton
348
0
0
22 Apr 2024
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
Minshuo Chen
Song Mei
Jianqing Fan
Mengdi Wang
VLM
MedIm
DiffM
276
81
0
11 Apr 2024
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning
Xudong Yu
Chenjia Bai
Hongyi Guo
Changhong Wang
Zhen Wang
OffRL
273
0
0
09 Apr 2024
Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning
Yi Shen
Hanyan Huang
Shan Xie
186
0
0
03 Apr 2024
Diffusion Model for Data-Driven Black-Box Optimization
Zihao Li
Hui Yuan
Kaixuan Huang
Chengzhuo Ni
Yinyu Ye
Minshuo Chen
Mengdi Wang
DiffM
211
19
0
20 Mar 2024
Simple Ingredients for Offline Reinforcement Learning
Edoardo Cetin
Andrea Tirinzoni
Matteo Pirotta
A. Lazaric
Yann Ollivier
Ahmed Touati
OffRL
294
2
0
19 Mar 2024
Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation
Xiaoying Zhang
Jean-François Ton
Wei Shen
Hongning Wang
Yang Liu
122
19
0
08 Mar 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
162
2
0
04 Mar 2024
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks
Ziping Xu
Zifan Xu
Runxuan Jiang
Peter Stone
Ambuj Tewari
292
2
0
03 Mar 2024
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Ruiqi Zhang
Yuexiang Zhai
Andrea Zanette
337
0
0
24 Feb 2024
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Imad Aouali
Victor-Emmanuel Brunel
David Rohde
Anna Korba
OffRL
308
9
0
22 Feb 2024
Offline Multi-task Transfer RL with Representational Penalization
Avinandan Bose
S. S. Du
Maryam Fazel
OffRL
229
12
0
19 Feb 2024
Counterfactual Influence in Markov Decision Processes
Milad Kazemi
Jessica Lally
Ekaterina Tishchenko
Hana Chockler
Nicola Paoletti
271
2
0
13 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Neural Information Processing Systems (NeurIPS), 2024
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
258
30
0
11 Feb 2024
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
International Conference on Machine Learning (ICML), 2024
Kaiwen Wang
Owen Oertell
Alekh Agarwal
Nathan Kallus
Wen Sun
OffRL
279
17
0
11 Feb 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
197
8
0
08 Feb 2024
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning
Ruoqing Zhang
Ziwei Luo
Jens Sjölund
Thomas B. Schön
Per Mattsson
258
20
0
06 Feb 2024
Return-Aligned Decision Transformer
Tsunehiko Tanaka
Kenshi Abe
Kaito Ariu
Tetsuro Morimura
Edgar Simo-Serra
OffRL
516
2
0
06 Feb 2024
QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model
Saizhuo Wang
Hang Yuan
Lionel M. Ni
Jian Guo
LLMAG
AIFin
97
24
0
06 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
217
44
0
29 Jan 2024
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Mao Hong
Zhiyue Zhang
Yue Wu
Yan Xu
OffRL
236
1
0
21 Jan 2024
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yuanzhao Zhai
Yiying Li
Zijian Gao
Xudong Gong
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
OffRL
126
3
0
11 Jan 2024
Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces
Neural Information Processing Systems (NeurIPS), 2024
Yaqi Duan
Martin J. Wainwright
OffRL
143
3
0
10 Jan 2024
Long-term Safe Reinforcement Learning with Binary Feedback
AAAI Conference on Artificial Intelligence (AAAI), 2024
Akifumi Wachi
Wataru Hashimoto
Kazumune Hashimoto
OffRL
306
6
0
08 Jan 2024
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
230
5
0
06 Jan 2024
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu
OffRL
246
1
0
19 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
323
289
0
18 Dec 2023
AI in Pharma for Personalized Sequential Decision-Making: Methods, Applications and Opportunities
Yuhan Li
Hongtao Zhang
Keaven M Anderson
Songzi Li
Ruoqing Zhu
137
0
0
30 Nov 2023
Risk-sensitive Markov Decision Process and Learning under General Utility Functions
Social Science Research Network (SSRN), 2023
Zhengqi Wu
Renyuan Xu
176
4
0
22 Nov 2023
Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching
International Conference on Machine Learning (ICML), 2023
Kai Yan
Alex Schwing
Yu-Xiong Wang
OffRL
256
1
0
02 Nov 2023
A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories
Neural Information Processing Systems (NeurIPS), 2023
Kai Yan
Alex Schwing
Yu-Xiong Wang
OffRL
252
6
0
02 Nov 2023
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
Joey Hong
Anca Dragan
Sergey Levine
OffRL
150
7
0
31 Oct 2023
Robust Offline Reinforcement learning with Heavy-Tailed Rewards
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Jin Zhu
Runzhe Wan
Zhengling Qi
Shuang Luo
C. Shi
OffRL
283
2
0
28 Oct 2023
Previous
1
2
3
4
5
6
Next