Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.04625
Cited By
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
7 November 2024
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sharp Analysis for KL-Regularized Contextual Bandits and RLHF"
2 / 2 papers shown
Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
66
0
0
26 Feb 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
35
0
0
09 Feb 2025
1