ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.06957
  4. Cited By
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

11 September 2024
Wei Shen
Chuheng Zhang
    OffRL
ArXivPDFHTML

Papers citing "Policy Filtration in RLHF to Fine-Tune LLM for Code Generation"

2 / 2 papers shown
Title
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Wei Shen
Guanlin Liu
Zheng Wu
Ruofei Zhu
Qingping Yang
Chao Xin
Yu Yue
Lin Yan
82
8
0
28 Mar 2025
Categorical Reparameterization with Gumbel-Softmax
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
75
5,262
0
03 Nov 2016
1