Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

11 September 2024

Papers citing "Policy Filtration in RLHF to Fine-Tune LLM for Code Generation"

2 / 2 papers shown

Title
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Wei Shen Guanlin Liu Zheng Wu Ruofei Zhu Qingping Yang Chao Xin Yu Yue Lin Yan 82 8 0 28 Mar 2025
Categorical Reparameterization with Gumbel-Softmax Eric Jang S. Gu Ben Poole BDL 75 5,262 0 03 Nov 2016