Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.06592
Cited By
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
5 June 2024
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
Yunxuan Li
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improve Mathematical Reasoning in Language Models by Automated Process Supervision"
50 / 107 papers shown
Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
70
1
0
05 May 2025
SymPlanner: Deliberate Planning in Language Models with Symbolic Representation
Siheng Xiong
Jieyu Zhou
Zhangding Liu
Yusen Su
LLMAG
LM&Ro
73
0
0
02 May 2025
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Adam Younsi
Abdalgader Abubaker
M. Seddik
Hakim Hacid
Salem Lahlou
LRM
54
0
0
28 Apr 2025
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
44
1
0
23 Apr 2025
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Bartosz Piotrowski
Witold Drzewakowski
Konrad Staniszewski
Piotr Miłoś
LRM
36
0
0
23 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq R. Joty
ELM
ALM
LRM
45
2
0
21 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
66
7
1
14 Apr 2025
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang
Wen-Ding Li
Daniele Paliotta
Daniel Ritter
Alexander M. Rush
Tri Dao
LRM
24
0
0
14 Apr 2025
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
Zhaopeng Feng
Shaosheng Cao
Jiahan Ren
Jiayuan Su
Ruizhe Chen
Yan Zhang
Zhe Xu
Yao Hu
Jian Wu
Zuozhu Liu
ALM
LRM
58
1
0
14 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
38
2
0
12 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
X. Wang
Z. Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
VLM
LRM
69
1
0
10 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
Zijian Wang
Chang Xu
LRM
21
1
0
09 Apr 2025
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi
Hritik Bansal
Arian Hosseini
Aditya Grover
Kai-Wei Chang
Marcus Rohrbach
Anna Rohrbach
OffRL
LRM
37
0
0
01 Apr 2025
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Jianhao Chen
Zishuo Xun
Bocheng Zhou
Han Qi
Qiaosheng Zhang
...
Wei Hu
Yuzhong Qu
W. Ouyang
Wanli Ouyang
Shuyue Hu
74
0
0
01 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
57
0
0
31 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
40
0
0
29 Mar 2025
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Jiakai Tang
Sunhao Dai
Teng Shi
Jun Xu
X. Chen
Wen Chen
Wu Jian
Yuning Jiang
LRM
63
5
0
28 Mar 2025
R-PRM: Reasoning-Driven Process Reward Modeling
Shuaijie She
Junxiao Liu
Yifeng Liu
Jiajun Chen
Xin Huang
Shujian Huang
LRM
36
2
0
27 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
74
2
0
26 Mar 2025
Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark
Bingchen Miao
Y. Wu
Minghe Gao
Qifan Yu
Wendong Bu
Wenqiao Zhang
Yunfei Li
Siliang Tang
Tat-Seng Chua
Juncheng Billy Li
LLMAG
LRM
56
0
0
24 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
40
0
0
22 Mar 2025
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Y. Li
Jiahao Xu
Tian Liang
Xingyu Chen
Zhiwei He
...
Rui Wang
Z. Zhang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
43
1
0
21 Mar 2025
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Ruihan Yang
Fanghua Ye
Jian Li
Siyu Yuan
Yikai Zhang
Zhaopeng Tu
Xiaolong Li
Deqing Yang
LLMAG
68
2
0
20 Mar 2025
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu
Yan Zheng
Rong Cheng
Qiyu Wu
Wei Guo
...
Hebin Liang
Yifu Yuan
Hangyu Mao
Fuzheng Zhang
Jianye Hao
LRM
AI4CE
54
1
0
20 Mar 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
54
2
0
19 Mar 2025
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
Vaibhav Aggarwal
Ojasv Kamal
Abhinav Japesh
Zhijing Jin
Bernhard Schölkopf
50
1
0
18 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
53
0
0
18 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
89
2
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
1
0
16 Mar 2025
MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling
Zhaopeng Feng
Jiahan Ren
Jiayuan Su
Jiamei Zheng
Zhihang Tang
Hongwei Wang
Zuozhu Liu
LRM
46
1
0
15 Mar 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
Giacomo Camposampiero
Michael Hersche
Roger Wattenhofer
A. Sebastian
Abbas Rahimi
LRM
48
1
0
14 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
46
4
0
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
60
10
0
13 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
42
1
0
11 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
86
31
0
10 Mar 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
...
Yu-Jie Yuan
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
ReLM
LRM
51
6
0
08 Mar 2025
Better Process Supervision with Bi-directional Rewarding Signals
Wenxiang Chen
Wei He
Zhiheng Xi
Honglin Guo
Boyang Hong
...
Nijun Li
Tao Gui
Yun Li
Qi Zhang
Xuanjing Huang
LRM
48
2
0
06 Mar 2025
AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
Songming Zhang
Xue Zhang
Tong Zhang
Bojie Hu
Yufeng Chen
Jinan Xu
44
1
0
04 Mar 2025
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
Wei Sun
Qianlong Du
Fuwei Cui
Jiajun Zhang
OffRL
LRM
31
0
0
04 Mar 2025
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models
Joykirat Singh
Tanmoy Chakraborty
A. Nambi
AI4Cl
LRM
ReLM
55
1
0
04 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
88
28
0
03 Mar 2025
Towards Widening The Distillation Bottleneck for Reasoning Models
Huifeng Yin
Yu Zhao
M. Wu
Xuanfan Ni
Bo Zeng
...
Liangying Shao
Chenyang Lyu
Longyue Wang
Weihua Luo
Kaifu Zhang
LRM
42
1
0
03 Mar 2025
Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
Miao Peng
Nuo Chen
Zongrui Suo
Jia Li
LRM
31
0
0
02 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Y. Zhang
Zongzhang Zhang
Yang Yu
ALM
46
0
0
01 Mar 2025
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision
Dawei Zhu
Xiyu Wei
Guangxiang Zhao
Wenhao Wu
Haosheng Zou
Junfeng Ran
Xun Wang
Lin Sun
Xiangzheng Zhang
Sujian Li
LRM
56
0
0
28 Feb 2025
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Daniele Paliotta
Junxiong Wang
Matteo Pagliardini
Kevin Y. Li
Aviv Bick
J. Zico Kolter
Albert Gu
F. Fleuret
Tri Dao
ReLM
LRM
43
7
0
27 Feb 2025
Complex LLM Planning via Automated Heuristics Discovery
Hongyi Ling
Shubham Parashar
Sambhav Khurana
Blake Olson
Anwesha Basu
Gaurangi Sinha
Z. Tu
James Caverlee
Shuiwang Ji
97
2
0
26 Feb 2025
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Guijin Son
Jiwoo Hong
Hyunwoo Ko
James Thorne
LRM
46
5
0
24 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
125
1
0
21 Feb 2025
Improving Value-based Process Verifier via Structural Prior Injection
Zetian Sun
Dongfang Li
Baotian Hu
Jun Yu
Min-Ling Zhang
35
0
0
21 Feb 2025
1
2
3
Next