ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.06592
  4. Cited By
Improve Mathematical Reasoning in Language Models by Automated Process
  Supervision

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

5 June 2024
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
Yunxuan Li
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
    LRM
ArXivPDFHTML

Papers citing "Improve Mathematical Reasoning in Language Models by Automated Process Supervision"

50 / 107 papers shown
Title
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
S2^22R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
57
2
0
18 Feb 2025
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Susumu Takeuchi
39
0
0
18 Feb 2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
LRM
48
2
0
17 Feb 2025
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
Xiaoyu Tan
Tianchu Yao
C. Qu
Bin Li
Minghao Yang
...
Haozhe Wang
Xihe Qiu
Wei Chu
Yinghui Xu
Yuan Qi
OffRL
LRM
44
2
0
17 Feb 2025
Integrating Expert Knowledge into Logical Programs via LLMs
Integrating Expert Knowledge into Logical Programs via LLMs
Franciszek Górski
Oskar Wysocki
Marco Valentino
André Freitas
46
0
0
17 Feb 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Y. Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
64
2
0
17 Feb 2025
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Jianyuan Zhong
Z. Li
Zhijian Xu
Xiangyu Wen
Qiang Xu
LRM
36
2
0
16 Feb 2025
A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1
A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1
Jun Wang
LRM
KELM
42
1
0
15 Feb 2025
VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
Thomas Zeng
Shuibai Zhang
Shutong Wu
Christian Classen
Daewon Chae
...
Jungtaek Kim
H. Koo
K. Ramchandran
Dimitris Papailiopoulos
Kangwook Lee
LRM
60
2
0
10 Feb 2025
Examining False Positives under Inference Scaling for Mathematical Reasoning
Examining False Positives under Inference Scaling for Mathematical Reasoning
Yu Guang Wang
Nan Yang
Liang Wang
Furu Wei
LRM
59
3
0
10 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Bin Cui
Mengdi Wang
ReLM
LRM
AI4CE
96
10
0
10 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
115
3
0
06 Feb 2025
STAIR: Improving Safety Alignment with Introspective Reasoning
STAIR: Improving Safety Alignment with Introspective Reasoning
Y. Zhang
Siyuan Zhang
Yao Huang
Zeyu Xia
Zhengwei Fang
Xiao Yang
Ranjie Duan
Dong Yan
Yinpeng Dong
Jun Zhu
LRM
LLMSV
53
3
0
04 Feb 2025
Process-Supervised Reinforcement Learning for Code Generation
Process-Supervised Reinforcement Learning for Code Generation
Yufan Ye
Ting Zhang
Wenbin Jiang
Hua Huang
OffRL
LRM
SyDa
55
1
0
03 Feb 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
L. Zhang
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
57
74
0
08 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Yiyao Yu
Xinzhe Ni
Zicheng Lin
Jin Zeng
Yujiu Yang
LRM
66
12
0
08 Jan 2025
Empowering Bengali Education with AI: Solving Bengali Math Word Problems through Transformer Models
Empowering Bengali Education with AI: Solving Bengali Math Word Problems through Transformer Models
Jalisha Jashim Era
Bidyarthi Paul
Tahmid Sattar Aothoi
Mirazur Rahman Zim
Faisal Muhammad Shah
33
0
0
07 Jan 2025
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
Beichen Zhang
Yuhong Liu
Xiaoyi Dong
Yuhang Zang
Pan Zhang
Haodong Duan
Yuhang Cao
D. Lin
J. T. Wang
LRM
ReLM
56
2
0
06 Jan 2025
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Shuangtao Li
Shuaihao Dong
Kexin Luan
Xinhan Di
Chaofan Ding
LRM
43
1
0
02 Jan 2025
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Z. Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
51
90
0
30 Dec 2024
Outcome-Refining Process Supervision for Code Generation
Outcome-Refining Process Supervision for Code Generation
Zhuohao Yu
Weizheng Gu
Yidong Wang
Zhengran Zeng
Jindong Wang
Wei Ye
Shikun Zhang
LRM
84
4
0
19 Dec 2024
Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code
  Generation and Execution in Complex Task Handling
Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code Generation and Execution in Complex Task Handling
Ziyi Ni
Yifan Li
Ning Yang
Dou Shen
Pin Lv
Daxiang Dong
LRM
61
0
0
19 Dec 2024
Phi-4 Technical Report
Phi-4 Technical Report
Marah Abdin
J. Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
...
Rachel A. Ward
Yue Wu
Dingli Yu
Cyril Zhang
Yi Zhang
ALM
SyDa
86
75
0
12 Dec 2024
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
Teng Wang
Wing-Yin Yu
Zhenqi He
Zehua Liu
Xiongwei Han
...
Han Wu
Wei Shi
Ruifeng She
Fangzhou Zhu
Tao Zhong
AIMat
OffRL
LRM
73
3
0
26 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Ben He
...
Le Sun
Jie Lou
Bowen Yu
Y. Lu
Hongyu Lin
ALM
79
2
0
18 Nov 2024
AtomThink: A Slow Thinking Framework for Multimodal Mathematical
  Reasoning
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Runhui Huang
...
Yihan Zeng
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
LRM
101
10
0
18 Nov 2024
Process Supervision-Guided Policy Optimization for Code Generation
Process Supervision-Guided Policy Optimization for Code Generation
Ning Dai
Zheng Wu
Renjie Zheng
Ziyun Wei
Wenlei Shi
Xing Jin
Guanlin Liu
Chen Dun
Liang Huang
Lin Yan
54
7
0
23 Oct 2024
On Designing Effective RL Reward at Training Time for LLM Reasoning
On Designing Effective RL Reward at Training Time for LLM Reasoning
Jiaxuan Gao
Shusheng Xu
Wenjie Ye
Weilin Liu
Chuyi He
Wei Fu
Zhiyu Mei
Guangju Wang
Yi Wu
OffRL
LRM
23
12
0
19 Oct 2024
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Zhenyu Wu
Qingkai Zeng
Z. Zhang
Zhaoxuan Tan
Chao Shen
Meng-Long Jiang
KELM
LRM
34
4
0
16 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
51
36
0
16 Oct 2024
Process Reward Model with Q-Value Rankings
Process Reward Model with Q-Value Rankings
W. Li
Yixuan Li
LRM
45
14
0
15 Oct 2024
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical
  Reasoning
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning
Joshua Ong Jun Leang
Aryo Pradipta Gema
Shay B. Cohen
ReLM
LRM
ReCod
31
2
0
14 Oct 2024
Agentic Information Retrieval
Agentic Information Retrieval
Weinan Zhang
Junwei Liao
Ning Li
Kounianhua Du
Jianghao Lin
AIFin
41
2
0
13 Oct 2024
OpenR: An Open Source Framework for Advanced Reasoning with Large
  Language Models
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Jun Wang
Meng Fang
Ziyu Wan
Muning Wen
Jiachen Zhu
...
Lei Chen
Lionel M. Ni
Linyi Yang
Ying Wen
W. Zhang
LRM
18
30
0
12 Oct 2024
Rewarding Progress: Scaling Automated Process Verifiers for LLM
  Reasoning
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRL
LRM
31
41
0
10 Oct 2024
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
52
6
0
10 Oct 2024
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language
  Model Mathematical Reasoning
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning
Ruosen Li
Ziming Luo
Xinya Du
LRM
26
0
0
08 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Y. Li
Pengfei Liu
VLM
37
67
0
08 Oct 2024
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized
  Distributions
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions
Yu-Shin Huang
Peter Just
Krishna Narayanan
Chao Tian
30
3
0
06 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
32
17
0
05 Oct 2024
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning
  Trajectories Search
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Murong Yue
Wenlin Yao
Haitao Mi
Dian Yu
Ziyu Yao
Dong Yu
LRM
30
4
0
04 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
  Mathematical Reasoning
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
27
42
0
03 Oct 2024
CodePMP: Scalable Preference Model Pretraining for Large Language Model
  Reasoning
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning
Huimu Yu
Xing Wu
Weidong Yin
Debing Zhang
Songlin Hu
LRM
20
5
0
03 Oct 2024
Evaluating Robustness of Reward Models for Mathematical Reasoning
Evaluating Robustness of Reward Models for Mathematical Reasoning
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Jungsoo Won
Dongha Lee
Jinyoung Yeo
23
4
0
02 Oct 2024
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit
  Assignment
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Amirhossein Kazemnejad
Milad Aghajohari
Eva Portelance
Alessandro Sordoni
Siva Reddy
Aaron C. Courville
Nicolas Le Roux
OffRL
LRM
20
21
0
02 Oct 2024
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Shengyu Feng
Xiang Kong
Shuang Ma
Aonan Zhang
Dong Yin
Chong-Jun Wang
Ruoming Pang
Yiming Yang
LRM
20
0
0
02 Oct 2024
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Yotam Wolf
Binyamin Rothberg
Dorin Shteyman
Amnon Shashua
13
0
0
26 Sep 2024
RISCORE: Enhancing In-Context Riddle Solving in Language Models through
  Context-Reconstructed Example Augmentation
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation
Ioannis Panagiotopoulos
Giorgos Filandrianos
Maria Lymperaiou
Giorgos Stamou
LRM
ReLM
31
0
0
24 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
M. Zhang
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
38
3
0
19 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Z. Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
41
11
0
04 Sep 2024
Previous
123
Next