ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.10080
  4. Cited By
Let's reward step by step: Step-Level reward model as the Navigators for
  Reasoning

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning

16 October 2023
Qianli Ma
Haotian Zhou
Tingkai Liu
Jianbo Yuan
Pengfei Liu
Yang You
Hongxia Yang
    LRM
ArXivPDFHTML

Papers citing "Let's reward step by step: Step-Level reward model as the Navigators for Reasoning"

13 / 13 papers shown
Title
Exploring Expert Failures Improves LLM Agent Tuning
Exploring Expert Failures Improves LLM Agent Tuning
Li-Cheng Lan
Andrew Bai
Minhao Cheng
Ruochen Wang
Cho-Jui Hsieh
LRM
43
0
0
17 Apr 2025
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
Chen Li
Yinyi Luo
Anudeep Bolimera
Uzair Ahmed
S.
Hrishikesh Gokhale
Marios Savvides
LRM
AI4CE
52
1
0
06 Mar 2025
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Rudra Murthy
Prince Kumar
Praveen Venkateswaran
Danish Contractor
KELM
ALM
ELM
24
1
0
16 Oct 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
21
2
0
24 Jun 2024
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process
  Refinement
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement
Weimin Xiong
Yifan Song
Xiutian Zhao
Wenhao Wu
Xun Wang
Ke Wang
Cheng Li
Wei Peng
Sujian Li
29
25
0
17 Jun 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
41
10
0
11 Jun 2024
Evaluating Mathematical Reasoning Beyond Accuracy
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia
Xuefeng Li
Yixin Liu
Tongshuang Wu
Pengfei Liu
LRM
ReLM
42
21
0
08 Apr 2024
Self-Evaluation Guided Beam Search for Reasoning
Self-Evaluation Guided Beam Search for Reasoning
Yuxi Xie
Kenji Kawaguchi
Yiran Zhao
Xu Zhao
MingSung Kan
Junxian He
Qizhe Xie
LRM
156
128
0
01 May 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1