Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.06957
Cited By
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation
11 September 2024
Wei Shen
Chuheng Zhang
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Policy Filtration in RLHF to Fine-Tune LLM for Code Generation"
2 / 2 papers shown
Title
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Wei Shen
Guanlin Liu
Zheng Wu
Ruofei Zhu
Qingping Yang
Chao Xin
Yu Yue
Lin Yan
82
8
0
28 Mar 2025
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
75
5,262
0
03 Nov 2016
1