Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.04895
Cited By
v1
v2 (latest)
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2021
9 June 2021
Tengyang Xie
Nan Jiang
Huan Wang
Caiming Xiong
Yu Bai
OffRL
OnRL
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning"
50 / 122 papers shown
From Static to Dynamic: Enhancing Offline-to-Online Reinforcement Learning via Energy-Guided Diffusion Stratification
Lipeng Zu
Hansong Zhou
Xiaonan Zhang
OffRL
OnRL
520
0
0
05 Nov 2025
Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
Lipeng Zu
Hansong Zhou
Xiaonan Zhang
OffRL
OnRL
343
1
0
05 Nov 2025
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
149
3
0
28 Oct 2025
Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach
Sebastian Reboul
Hélène Halconruy
Randal Douc
OffRL
167
0
0
22 Oct 2025
Rate optimal learning of equilibria from data
Till Freihaut
Luca Viano
Emanuele Nevali
Volkan Cevher
Matthieu Geist
Giorgia Ramponi
141
0
0
10 Oct 2025
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
249
16
0
05 Oct 2025
Adaptive Policy Backbone via Shared Network
Bumgeun Park
Donghwan Lee
OffRL
OnRL
320
0
0
26 Sep 2025
Generalizing Behavior via Inverse Reinforcement Learning with Closed-Form Reward Centroids
Filippo Lazzati
Alberto Maria Metelli
154
0
0
15 Sep 2025
Statistical and Algorithmic Foundations of Reinforcement Learning
Yuejie Chi
Yuxin Chen
Yuting Wei
OffRL
278
3
0
19 Jul 2025
Reinforcement Learning with Action Chunking
Qiyang Li
Zhiyuan Zhou
Sergey Levine
OffRL
OnRL
507
39
0
10 Jul 2025
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Ruiquan Huang
Donghao Li
Chengshuai Shi
Cong Shen
Jing Yang
OffRL
509
0
0
01 Jul 2025
Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design
Andreas Schlaginhaufen
Reda Ouhamma
Maryam Kamgarpour
292
4
0
11 Jun 2025
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Gaurav Chaudhary
Wassim Uddin Mondal
Laxmidhar Behera
OffRL
483
4
0
11 Jun 2025
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
Till Freihaut
Luca Viano
Volkan Cevher
Matthieu Geist
Giorgia Ramponi
352
2
0
23 May 2025
Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM
Thang Duong
Minglai Yang
Chicheng Zhang
OffRL
489
3
0
16 May 2025
Offline and Distributional Reinforcement Learning for Wireless Communications
IEEE Communications Magazine (IEEE Commun. Mag.), 2025
Eslam Eldeeb
Hirley Alves
OffRL
241
5
0
04 Apr 2025
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Jiani Zheng
Lu Wang
Fangkai Yang
Chen Zhang
Shansong Liu
Wenjie Yin
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
423
16
0
26 Feb 2025
MILE: Model-based Intervention Learning
IEEE International Conference on Robotics and Automation (ICRA), 2025
Yigit Korkmaz
Erdem Bıyık
401
12
0
21 Feb 2025
On The Statistical Complexity of Offline Decision-Making
International Conference on Machine Learning (ICML), 2025
Thanh Nguyen-Tang
R. Arora
OffRL
554
2
0
10 Jan 2025
Attention-Enhanced Short-Time Wiener Solution for Acoustic Echo Cancellation
Fei Zhao
Xueliang Zhang
288
3
0
25 Dec 2024
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
412
5
0
13 Dec 2024
Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Chengrui Qu
Laixi Shi
Kishan Panaganti
Pengcheng You
Adam Wierman
OffRL
OnRL
312
8
0
06 Nov 2024
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson
Qiyang Li
Kevin Frans
Sergey Levine
SSL
OffRL
OnRL
898
8
0
23 Oct 2024
Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
Jifeng Hu
Sili Huang
Li Shen
Zhejian Yang
Shengchao Hu
Shisong Tang
Hechang Chen
Yi Chang
Dacheng Tao
Lichao Sun
OffRL
262
3
0
21 Oct 2024
Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning
Shreyas Muthusamy
Damian Owerko
Charilaos I. Kanatsoulis
Saurav Agarwal
Alejandro Ribeiro
335
1
0
29 Sep 2024
Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance
Conference on Robot Learning (CoRL), 2024
Renming Huang
Shaochong Liu
Yunqiang Pei
Peng Wang
Guoqing Wang
Yang Yang
Hengtao Shen
OffRL
328
1
0
06 Sep 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
596
1
0
22 Aug 2024
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Neural Information Processing Systems (NeurIPS), 2024
Kevin Tan
Wei Fan
Yuting Wei
OffRL
382
6
0
08 Aug 2024
Rocket Landing Control with Random Annealing Jump Start Reinforcement Learning
Yuxuan Jiang
Yujie Yang
Zhiqian Lan
Tianze Zhu
Shengbo Eben Li
Qi Sun
Jian Ma
Tianwen Yu
Changwu Zhang
175
5
0
21 Jul 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Delin Qu
Mladen Kolar
Tong Zhang
OffRL
294
4
0
10 Jul 2024
FOSP: Fine-tuning Offline Safe Policy through World Models
Chenyang Cao
Yucheng Xin
Silang Wu
Longxiang He
Zichen Yan
Junbo Tan
Xueqian Wang
OffRL
478
3
0
06 Jul 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
353
6
0
11 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
387
14
0
06 Jun 2024
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Haotian Hu
Yiqin Yang
Jianing Ye
Chengjie Wu
Ziqing Mai
Yujing Hu
Tangjie Lv
Changjie Fan
Qianchuan Zhao
Chongjie Zhang
OffRL
OnRL
321
9
0
31 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
OnRL
296
7
0
28 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
433
97
0
26 May 2024
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
OffRL
318
234
0
13 May 2024
RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
International Conference on Machine Learning (ICML), 2024
Zelei Cheng
Xian Wu
Jiahao Yu
Sabrina Yang
Gang Wang
Xinyu Xing
OffRL
398
10
0
05 May 2024
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning
Chenjia Bai
Lingxiao Wang
Jianye Hao
Zhuoran Yang
Bin Zhao
Zhen Wang
Xuelong Li
OffRL
320
12
0
30 Apr 2024
Optimal Design for Human Preference Elicitation
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
Branislav Kveton
434
0
0
22 Apr 2024
Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning
American Control Conference (ACC), 2024
Antonio Lopez
David Fridovich-Keil
296
3
0
18 Mar 2024
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
Kevin Tan
Ziping Xu
OffRL
OnRL
399
5
0
07 Mar 2024
Advancing Investment Frontiers: Industry-grade Deep Reinforcement Learning for Portfolio Optimization
Philip Ndikum
Serge Ndikum
392
11
0
27 Feb 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
311
9
0
08 Feb 2024
Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization
Talha Bozkus
Urbashi Mitra
OffRL
322
8
0
08 Feb 2024
Learning from Sparse Offline Datasets via Conservative Density Estimation
International Conference on Learning Representations (ICLR), 2024
Zhepeng Cen
Zuxin Liu
Zitong Wang
Yi-Fan Yao
Henry Lam
Ding Zhao
OffRL
311
12
0
16 Jan 2024
An Information Theoretic Approach to Interaction-Grounded Learning
International Conference on Machine Learning (ICML), 2024
Xiaoyan Hu
Farzan Farnia
Ho-fung Leung
VLM
502
3
0
10 Jan 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
506
337
0
18 Dec 2023
Advancing RAN Slicing with Offline Reinforcement Learning
International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2023
Kun Yang
Shu-ping Yeh
Menglei Zhang
J. Sydir
Jing Yang
Cong Shen
OffRL
254
10
0
16 Dec 2023
RLIF: Interactive Imitation Learning as Reinforcement Learning
International Conference on Learning Representations (ICLR), 2023
Jianlan Luo
Perry Dong
Yuexiang Zhai
Yi-An Ma
Sergey Levine
OffRL
530
31
0
21 Nov 2023
1
2
3
Next
Page 1 of 3