ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.08146
  4. Cited By
Rewarding Progress: Scaling Automated Process Verifiers for LLM
  Reasoning

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

10 October 2024
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
    OffRL
    LRM
ArXivPDFHTML

Papers citing "Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning"

35 / 35 papers shown
Title
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
X. Chen
Gaotang Li
Z. Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
94
0
0
05 May 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
86
0
0
21 Apr 2025
Weight Ensembling Improves Reasoning in Language Models
Weight Ensembling Improves Reasoning in Language Models
Xingyu Dang
Christina Baek
Kaiyue Wen
Zico Kolter
Aditi Raghunathan
MoMe
LRM
60
1
0
14 Apr 2025
Reasoning without Regret
Reasoning without Regret
Tarun Chitra
OffRL
LRM
23
0
0
14 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
38
2
0
12 Apr 2025
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
Zhuo Zhi
Qiangqiang Wu
Minghe shen
W. J. Li
Yinchuan Li
Kun Shao
Kaiwen Zhou
LLMAG
33
0
0
06 Apr 2025
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Ram Ramrakhya
Matthew Chang
Xavier Puig
Ruta Desai
Z. Kira
Roozbeh Mottaghi
LLMAG
LM&Ro
64
0
0
01 Apr 2025
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Jixuan Leng
Chengsong Huang
Langlin Huang
Bill Yuchen Lin
William W. Cohen
Haohan Wang
Jiaxin Huang
LRM
39
0
0
30 Mar 2025
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Jiakai Tang
Sunhao Dai
Teng Shi
Jun Xu
X. Chen
Wen Chen
Wu Jian
Yuning Jiang
LRM
63
5
0
28 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
89
2
0
26 Mar 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
54
2
0
19 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
53
0
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
1
0
16 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
46
4
0
13 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
42
1
0
11 Mar 2025
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Yuxiao Qu
Matthew Y. R. Yang
Amrith Rajagopal Setlur
Lewis Tunstall
E. Beeching
Ruslan Salakhutdinov
Aviral Kumar
OffRL
62
11
0
10 Mar 2025
Process-Supervised LLM Recommenders via Flow-guided Tuning
Process-Supervised LLM Recommenders via Flow-guided Tuning
Chongming Gao
Mengyao Gao
Chenxiao Fan
Shuai Yuan
Wentao Shi
Xiangnan He
74
2
0
10 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
86
31
0
10 Mar 2025
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee
Ethan Che
Tianyi Peng
LRM
42
10
0
03 Mar 2025
Multi-Turn Code Generation Through Single-Step Rewards
Multi-Turn Code Generation Through Single-Step Rewards
A. Jain
Gonzalo Gonzalez-Pumariega
Wayne Chen
Alexander M. Rush
Wenting Zhao
Sanjiban Choudhury
LRM
47
1
0
27 Feb 2025
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Yudi Zhang
Lu Wang
Meng Fang
Yali Du
Chenghua Huang
...
Qingwei Lin
Mykola Pechenizkiy
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
ALM
71
0
0
26 Feb 2025
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction
Suozhi Huang
Peiyang Song
Robert Joseph George
Anima Anandkumar
AI4TS
LRM
42
2
0
25 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
143
1
0
21 Feb 2025
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
S2^22R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
57
2
0
18 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Bin Cui
Mengdi Wang
ReLM
LRM
AI4CE
96
10
0
10 Feb 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen
Guangtao Zeng
Zhenting Qi
Zhang-Wei Hong
Zhenfang Chen
Wei Lu
G. Wornell
Subhro Das
David D. Cox
Chuang Gan
LLMAG
LRM
112
5
0
04 Feb 2025
Process Reinforcement through Implicit Rewards
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Z. Wang
Hanbin Wang
Wendi Li
...
Yu Cheng
Zhiyuan Liu
Maosong Sun
Bowen Zhou
Ning Ding
OffRL
LRM
68
51
0
03 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
70
53
0
28 Jan 2025
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
Y. Hu
Sheng Ouyang
Yong Liu
LRM
29
0
0
23 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
W. Zhang
Kai Chen
D. Lin
Jiaqi Wang
VLM
70
18
0
21 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Yiyao Yu
Xinzhe Ni
Zicheng Lin
Jin Zeng
Yujiu Yang
LRM
68
12
0
08 Jan 2025
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect
  Verifiers
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Benedikt Stroebl
Sayash Kapoor
Arvind Narayanan
LRM
82
11
0
26 Nov 2024
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
Teng Wang
Wing-Yin Yu
Zhenqi He
Zehua Liu
Xiongwei Han
...
Han Wu
Wei Shi
Ruifeng She
Fangzhou Zhu
Tao Zhong
AIMat
OffRL
LRM
78
3
0
26 Nov 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
108
63
0
25 Nov 2024
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit
  Assignment
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Amirhossein Kazemnejad
Milad Aghajohari
Eva Portelance
Alessandro Sordoni
Siva Reddy
Aaron C. Courville
Nicolas Le Roux
OffRL
LRM
25
21
0
02 Oct 2024
1