Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.17401
Cited By
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
25 September 2024
Qining Zhang
Lei Ying
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference"
Title
No papers