ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.04625
  4. Cited By
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

7 November 2024
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
    OffRL
ArXivPDFHTML

Papers citing "Sharp Analysis for KL-Regularized Contextual Bandits and RLHF"

2 / 2 papers shown
Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
66
0
0
26 Feb 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
35
0
0
09 Feb 2025
1