ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01679
  4. Cited By
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit
  Assignment

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

2 October 2024
Amirhossein Kazemnejad
Milad Aghajohari
Eva Portelance
Alessandro Sordoni
Siva Reddy
Aaron C. Courville
Nicolas Le Roux
    OffRL
    LRM
ArXivPDFHTML

Papers citing "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"

4 / 4 papers shown
Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
113
2
0
29 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Tianyi Zhou
Jieyu Zhao
LRM
76
1
0
07 Apr 2025
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Zhenyu Hou
Xin Lv
Rui Lu
J. Zhang
Y. Li
Zijun Yao
Juanzi Li
J. Tang
Yuxiao Dong
OffRL
LRM
ReLM
49
20
0
20 Jan 2025
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
77
5
0
23 Oct 2024
1