ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15085
  4. Cited By
Is Pessimism Provably Efficient for Offline RL?
v1v2v3 (latest)

Is Pessimism Provably Efficient for Offline RL?

International Conference on Machine Learning (ICML), 2020
30 December 2020
Ying Jin
Zhuoran Yang
Zhaoran Wang
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Is Pessimism Provably Efficient for Offline RL?"

50 / 290 papers shown
Title
Offline Clustering of Preference Learning with Active-data Augmentation
Offline Clustering of Preference Learning with Active-data Augmentation
Jingyuan Liu
Fatemeh Ghaffari
Xuchuang Wang
Xutong Liu
Mohammad Hajiesmaili
Carlee Joe-Wong
OffRL
118
0
0
30 Oct 2025
Greedy Sampling Is Provably Efficient for RLHF
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
58
0
0
28 Oct 2025
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Jongmin Lee
Ernest K. Ryu
OffRL
72
0
0
20 Oct 2025
Offline and Online KL-Regularized RLHF under Differential Privacy
Offline and Online KL-Regularized RLHF under Differential Privacy
Yulian Wu
Rushil Thareja
Praneeth Vepakomma
Francesco Orabona
OffRL
68
0
0
15 Oct 2025
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Yuchun Miao
Liang Ding
Sen Zhang
Rong Bao
L. Zhang
Dacheng Tao
132
0
0
15 Oct 2025
Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death
Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death
Sihyung Park
Wenbin Lu
Shu Yang
32
0
0
08 Oct 2025
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Ziyi Chen
Junyi Li
Qi He
Heng-Chiao Huang
92
0
0
07 Oct 2025
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
140
9
0
05 Oct 2025
Generalized Fitted Q-Iteration with Clustered Data
Generalized Fitted Q-Iteration with Clustered Data
Liyuan Hu
Jitao Wang
Zhenke Wu
C. Shi
OffRL
96
0
0
04 Oct 2025
Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
Best-of-Majority: Minimax-Optimal Strategy for Pass@kkk Inference Scaling
Qiwei Di
Kaixuan Ji
Xuheng Li
Heyang Zhao
Quanquan Gu
68
0
0
03 Oct 2025
PASTA: A Unified Framework for Offline Assortment Learning
PASTA: A Unified Framework for Offline Assortment Learning
Juncheng Dong
Weibin Mo
Zhengling Qi
C. Shi
Ethan X. Fang
Vahid Tarokh
OffRL
86
0
0
02 Oct 2025
Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback
Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback
Shreyas Chaudhari
Renhao Zhang
Philip S. Thomas
Bruno Castro da Silva
OffRL
100
1
0
30 Sep 2025
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Xiaobo Wang
Zixia Jia
Jiaqi Li
Qi Liu
Zilong Zheng
76
0
0
03 Sep 2025
A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory
A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory
Fengdi Che
OffRL
88
0
0
11 Aug 2025
Uncertainty Sets for Distributionally Robust Bandits Using Structural Equation Models
Uncertainty Sets for Distributionally Robust Bandits Using Structural Equation Models
Katherine Avery
Chinmay Pendse
David D. Jensen
OffRL
84
0
0
04 Aug 2025
Statistical and Algorithmic Foundations of Reinforcement Learning
Statistical and Algorithmic Foundations of Reinforcement Learning
Yuejie Chi
Yuxin Chen
Yuting Wei
OffRL
161
2
0
19 Jul 2025
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Ruiquan Huang
Donghao Li
Chengshuai Shi
Cong Shen
Jing Yang
OffRL
369
0
0
01 Jul 2025
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Andrew Wagenmaker
Mitsuhiko Nakamoto
Yunchu Zhang
S. Park
Waleed Yagoub
Anusha Nagabandi
Abhishek Gupta
Sergey Levine
OffRL
257
21
0
18 Jun 2025
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Gaurav Chaudhary
Wassim Uddin Mondal
Laxmidhar Behera
OffRL
223
1
0
11 Jun 2025
How to Provably Improve Return Conditioned Supervised Learning?
Zhishuai Liu
Yu Yang
Ruhan Wang
Pan Xu
Dongruo Zhou
OffRL
111
1
0
10 Jun 2025
Generalized Linear Markov Decision Process
Generalized Linear Markov Decision Process
Sinian Zhang
Kaicheng Zhang
Ziping Xu
Tianxi Cai
D. Zhou
172
0
0
01 Jun 2025
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
190
1
0
26 May 2025
Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
Chi Zhang
Ziying Jia
George Atia
Sihong He
Yue Wang
321
1
0
24 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
174
1
0
23 May 2025
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Junwon Seo
Kensuke Nakamura
Andrea V. Bajcsy
326
8
0
01 May 2025
Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning
Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning
Ke Jiang
Wen Jiang
You Li
Xiaoyang Tan
OffRL
264
0
0
02 Apr 2025
NeuroSep-CP-LCB: A Deep Learning-based Contextual Multi-armed Bandit Algorithm with Uncertainty Quantification for Early Sepsis Prediction
NeuroSep-CP-LCB: A Deep Learning-based Contextual Multi-armed Bandit Algorithm with Uncertainty Quantification for Early Sepsis Prediction
Anni Zhou
Raheem Beyah
Rishikesan Kamaleswaran
220
1
0
20 Mar 2025
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
237
2
0
10 Mar 2025
Clustered KL-barycenter design for policy evaluation
Simon Weissmann
Till Freihaut
Claire Vernade
Giorgia Ramponi
Leif Döring
OffRL
213
1
0
04 Mar 2025
Behavior Preference Regression for Offline Reinforcement Learning
Padmanaba Srinivasan
William J. Knottenbelt
OffRL
150
0
0
02 Mar 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
493
4
0
26 Feb 2025
Learning an Optimal Assortment Policy under Observational Data
Learning an Optimal Assortment Policy under Observational Data
Yuxuan Han
Han Zhong
Miao Lu
Jose H. Blanchet
Zhengyuan Zhou
OffRL
498
1
0
10 Feb 2025
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Abdullah Akgul
Manuel Haußmann
M. Kandemir
OffRL
493
0
0
17 Jan 2025
Enabling Realtime Reinforcement Learning at Scale with Staggered
  Asynchronous Inference
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous InferenceInternational Conference on Learning Representations (ICLR), 2024
Matthew D Riemer
G. Subbaraj
Glen Berseth
Irina Rish
OffRL
255
4
0
18 Dec 2024
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
Fu-Chieh Chang
Yu-Ting Lee
Hui-Ying Shih
Pei-Yuan Wu
Pei-Yuan Wu
OffRLLRM
896
1
0
31 Oct 2024
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic
  Management in Network Simulation
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network SimulationNeural Information Processing Systems (NeurIPS), 2024
Momin Haider
Ming Yin
Menglei Zhang
Arpit Gupta
Jing Zhu
Yu-Xiang Wang
OffRL
142
2
0
30 Oct 2024
Uncertainty-Penalized Direct Preference Optimization
Uncertainty-Penalized Direct Preference Optimization
Sam Houliston
Alizée Pace
Alexander Immer
Gunnar Rätsch
128
0
0
26 Oct 2024
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Chen Yang
Chenyang Zhao
Q. Gu
Dongruo Zhou
LRM
177
4
0
22 Oct 2024
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RLInternational Conference on Learning Representations (ICLR), 2024
C. Voelcker
Marcel Hussing
Eric Eaton
Amir-massoud Farahmand
Igor Gilitschenski
328
10
0
11 Oct 2024
Choices are More Important than Efforts: LLM Enables Efficient
  Multi-Agent Exploration
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu
Boyuan Wang
Yuhang Jiang
Jianzhun Shao
Yixiu Mao
Cheems Wang
Chang Liu
Xiangyang Ji
274
9
0
03 Oct 2024
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model PretrainingInternational Conference on Learning Representations (ICLR), 2024
Jie Cheng
Ruixi Qiao
Gang Xiong
Binhua Li
Yingwei Ma
Binhua Li
Yongbin Li
Yisheng Lv
OffRLOnRLLM&Ro
261
7
0
01 Oct 2024
Optimization Solution Functions as Deterministic Policies for Offline
  Reinforcement Learning
Optimization Solution Functions as Deterministic Policies for Offline Reinforcement LearningAmerican Control Conference (ACC), 2024
Vanshaj Khattar
Ming Jin
OffRL
160
0
0
27 Aug 2024
Mitigating Distribution Shift in Model-based Offline RL via Shifts-aware Reward Learning
Mitigating Distribution Shift in Model-based Offline RL via Shifts-aware Reward Learning
Wang Luo
Haoran Li
Zicheng Zhang
Congying Han
Chi Zhou
Jiayu Lv
Tiande Guo
OffRL
382
1
0
23 Aug 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
399
1
0
22 Aug 2024
Hokoff: Real Game Dataset from Honor of Kings and its Offline
  Reinforcement Learning Benchmarks
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning BenchmarksNeural Information Processing Systems (NeurIPS), 2024
Yun Qu
Boyuan Wang
Jianzhun Shao
Yuhang Jiang
Chen Chen
...
Qiang Fu
Wei Yang
Guang Yang
Lanxiao Huang
Xiangyang Ji
OffRL
159
14
0
20 Aug 2024
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?Neural Information Processing Systems (NeurIPS), 2024
Ying Fan
Jingling Li
Adith Swaminathan
Aditya Modi
Ching-An Cheng
OffRL
272
0
0
14 Aug 2024
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPsNeural Information Processing Systems (NeurIPS), 2024
Kevin Tan
Wei Fan
Yuting Wei
OffRL
246
5
0
08 Aug 2024
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
Hao-ming Lin
Wenhao Ding
Jian Chen
Laixi Shi
Jiacheng Zhu
Yue Liu
Ding Zhao
OffRLCML
375
3
0
15 Jul 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Delin Qu
Mladen Kolar
Tong Zhang
OffRL
194
1
0
10 Jul 2024
Benchmarks for Reinforcement Learning with Biased Offline Data and
  Imperfect Simulators
Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators
Ori Linial
Guy Tennenholtz
Uri Shalit
OffRL
210
1
0
30 Jun 2024
123456
Next