ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.04157
  4. Cited By
The Role of Coverage in Online Reinforcement Learning

The Role of Coverage in Online Reinforcement Learning

9 October 2022
Tengyang Xie
Dylan J. Foster
Yu Bai
Nan Jiang
Sham Kakade
    OffRL
ArXivPDFHTML

Papers citing "The Role of Coverage in Online Reinforcement Learning"

41 / 41 papers shown
Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
77
0
0
26 Feb 2025
Decision Making in Hybrid Environments: A Model Aggregation Approach
Decision Making in Hybrid Environments: A Model Aggregation Approach
Haolin Liu
Chen-Yu Wei
Julian Zimmert
83
0
0
09 Feb 2025
Improving Environment Novelty Quantification for Effective Unsupervised Environment Design
Jayden Teoh
Wenjun Li
Pradeep Varakantham
53
1
0
08 Feb 2025
Hybrid Preference Optimization for Alignment: Provably Faster
  Convergence Rates by Combining Offline Preferences with Online Exploration
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
71
1
0
13 Dec 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
The Central Role of the Loss Function in Reinforcement Learning
The Central Role of the Loss Function in Reinforcement Learning
Kaiwen Wang
Nathan Kallus
Wen Sun
OffRL
41
7
0
19 Sep 2024
Coverage Analysis of Multi-Environment Q-Learning Algorithms for
  Wireless Network Optimization
Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization
Talha Bozkus
Urbashi Mitra
28
2
0
29 Aug 2024
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Kevin Tan
Wei Fan
Yuting Wei
OffRL
69
2
0
08 Aug 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
74
2
0
11 Jun 2024
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy
  Evaluation
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Jeongyeol Kwon
Shie Mannor
C. Caramanis
Yonathan Efroni
OffRL
29
2
0
03 Jun 2024
Exploratory Preference Optimization: Harnessing Implicit
  Q*-Approximation for Sample-Efficient RLHF
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Tengyang Xie
Dylan J. Foster
Akshay Krishnamurthy
Corby Rosset
Ahmed Hassan Awadallah
Alexander Rakhlin
41
33
0
31 May 2024
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Haotian Hu
Yiqin Yang
Jianing Ye
Chengjie Wu
Ziqing Mai
Yujing Hu
Tangjie Lv
Changjie Fan
Qianchuan Zhao
Chongjie Zhang
OffRL
OnRL
37
3
0
31 May 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
144
114
0
04 Apr 2024
Multiple-policy Evaluation via Density Estimation
Multiple-policy Evaluation via Density Estimation
Yilei Chen
Aldo Pacchiano
I. Paschalidis
OffRL
19
0
0
29 Mar 2024
The Value of Reward Lookahead in Reinforcement Learning
The Value of Reward Lookahead in Reinforcement Learning
Nadav Merlis
Dorian Baudry
Vianney Perchet
13
0
0
18 Mar 2024
A Natural Extension To Online Algorithms For Hybrid RL With Limited
  Coverage
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
Kevin Tan
Ziping Xu
OffRL
OnRL
34
4
0
07 Mar 2024
Advancing Investment Frontiers: Industry-grade Deep Reinforcement
  Learning for Portfolio Optimization
Advancing Investment Frontiers: Industry-grade Deep Reinforcement Learning for Portfolio Optimization
Philip Ndikum
Serge Ndikum
44
1
0
27 Feb 2024
On the Performance of Empirical Risk Minimization with Smoothed Data
On the Performance of Empirical Risk Minimization with Smoothed Data
Adam Block
Alexander Rakhlin
Abhishek Shetty
39
3
0
22 Feb 2024
Off-Policy Evaluation in Markov Decision Processes under Weak
  Distributional Overlap
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
Mohammad Mehrabi
Stefan Wager
OffRL
24
14
0
13 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General
  Preference Model
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
38
9
0
11 Feb 2024
Model-Based RL for Mean-Field Games is not Statistically Harder than
  Single-Agent RL
Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL
Jiawei Huang
Niao He
Andreas Krause
24
6
0
08 Feb 2024
Mitigating Covariate Shift in Misspecified Regression with Applications
  to Reinforcement Learning
Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning
P. Amortila
Tongyi Cao
Akshay Krishnamurthy
OffRL
OOD
38
1
0
22 Jan 2024
Agnostic Interactive Imitation Learning: New Theory and Practical
  Algorithms
Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms
Yichen Li
Chicheng Zhang
OffRL
31
0
0
28 Dec 2023
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Yifei Zhou
Ayush Sekhari
Yuda Song
Wen Sun
OffRL
OnRL
30
8
0
14 Nov 2023
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
Michal Nauman
Marek Cygan
27
1
0
30 Oct 2023
Unsupervised Behavior Extraction via Random Intent Priors
Unsupervised Behavior Extraction via Random Intent Priors
Haotian Hu
Yiqin Yang
Jianing Ye
Ziqing Mai
Chongjie Zhang
OffRL
32
6
0
28 Oct 2023
When is Agnostic Reinforcement Learning Statistically Tractable?
When is Agnostic Reinforcement Learning Statistically Tractable?
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
OffRL
22
5
0
09 Oct 2023
Efficient Model-Free Exploration in Low-Rank MDPs
Efficient Model-Free Exploration in Low-Rank MDPs
Zakaria Mhammedi
Adam Block
Dylan J. Foster
Alexander Rakhlin
OffRL
19
13
0
08 Jul 2023
Active Coverage for PAC Reinforcement Learning
Active Coverage for PAC Reinforcement Learning
Aymen Al Marjani
Andrea Tirinzoni
E. Kaufmann
OffRL
16
4
0
23 Jun 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning,
  and Exploration
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Zhihan Liu
Miao Lu
Wei Xiong
Han Zhong
Haotian Hu
Shenao Zhang
Sirui Zheng
Zhuoran Yang
Zhaoran Wang
OffRL
32
22
0
29 May 2023
The Benefits of Being Distributional: Small-Loss Bounds for
  Reinforcement Learning
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Kaiwen Wang
Kevin Zhou
Runzhe Wu
Nathan Kallus
Wen Sun
OffRL
26
17
0
25 May 2023
On the Statistical Efficiency of Mean Field Reinforcement Learning with
  General Function Approximation
On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation
Jiawei Huang
Batuhan Yardim
Niao He
31
10
0
18 May 2023
What can online reinforcement learning with function approximation
  benefit from general coverage conditions?
What can online reinforcement learning with function approximation benefit from general coverage conditions?
Fanghui Liu
Luca Viano
V. Cevher
OffRL
12
2
0
25 Apr 2023
Eluder-based Regret for Stochastic Contextual MDPs
Eluder-based Regret for Stochastic Contextual MDPs
Orin Levy
Asaf B. Cassel
Alon Cohen
Yishay Mansour
23
5
0
27 Nov 2022
Model-Free Reinforcement Learning with the Decision-Estimation
  Coefficient
Model-Free Reinforcement Learning with the Decision-Estimation Coefficient
Dylan J. Foster
Noah Golowich
Jian Qian
Alexander Rakhlin
Ayush Sekhari
OffRL
30
9
0
25 Nov 2022
When is Realizability Sufficient for Off-Policy Reinforcement Learning?
When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Andrea Zanette
OffRL
11
14
0
10 Nov 2022
Leveraging Offline Data in Online Reinforcement Learning
Leveraging Offline Data in Online Reinforcement Learning
Andrew Wagenmaker
Aldo Pacchiano
OffRL
OnRL
27
36
0
09 Nov 2022
Provably Efficient Reinforcement Learning with Linear Function
  Approximation Under Adaptivity Constraints
Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints
Chi Jin
Zhuoran Yang
Zhaoran Wang
OffRL
107
166
0
06 Jan 2021
Reward-Free Exploration for Reinforcement Learning
Reward-Free Exploration for Reinforcement Learning
Chi Jin
A. Krishnamurthy
Max Simchowitz
Tiancheng Yu
OffRL
104
194
0
07 Feb 2020
Optimism in Reinforcement Learning with Generalized Linear Function
  Approximation
Optimism in Reinforcement Learning with Generalized Linear Function Approximation
Yining Wang
Ruosong Wang
S. Du
A. Krishnamurthy
127
135
0
09 Dec 2019
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
198
1,327
0
05 Jun 2016
1