ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.19229
  4. Cited By
StepWiser: Stepwise Generative Judges for Wiser Reasoning
v1v2 (latest)

StepWiser: Stepwise Generative Judges for Wiser Reasoning

26 August 2025
Wei Xiong
Wenting Zhao
Weizhe Yuan
O. Yu. Golovneva
Tong Zhang
Jason Weston
Sainbayar Sukhbaatar
    LRM
ArXiv (abs)PDFHTMLHuggingFace (17 upvotes)Github (916★)

Papers citing "StepWiser: Stepwise Generative Judges for Wiser Reasoning"

9 / 9 papers shown
Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models
Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models
Jingwei Ni
Ekaterina Fadeeva
Tianyi Wu
Mubashara Akhtar
Jiaheng Zhang
...
Markus Leippold
Timothy Baldwin
See-Kiong Ng
Artem Shelmanov
Mrinmaya Sachan
LRM
228
0
0
09 Nov 2025
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Austin Xu
Xuan-Phi Nguyen
Yilun Zhou
Chien-Sheng Wu
Caiming Xiong
Shafiq Joty
OffRLALMLRMELM
221
0
0
20 Oct 2025
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang
Weihang Su
Hongtao Tian
Tao Yang
Yujia Zhou
Ting Yao
Qingyao Ai
Yiqun Liu
LRM
103
0
0
13 Oct 2025
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLMLRMAI4CE
278
0
0
02 Oct 2025
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Yiran Shen
Yu Xia
Jonathan D. Chang
Prithviraj Ammanabrolu
160
0
0
01 Oct 2025
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Heyang Gao
Guoqing Liu
Erxue Min
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Xu Chen
148
0
0
26 Sep 2025
ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
Gaole Dai
Shiqi Jiang
Ting Cao
Yuqing Yang
Yuanchun Li
Rui Tan
Mo Li
Lili Qiu
LRM
128
0
0
26 Sep 2025
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Chenlu Ye
Zhou Yu
Ziji Zhang
Hao Chen
Narayanan Sadagopan
Jing-Fu Huang
Tong Zhang
Anurag Beniwal
LRM
132
10
0
03 Sep 2025
Lost at the Beginning of Reasoning
Lost at the Beginning of Reasoning
Baohao Liao
Xinyi Chen
Sara Rajaee
Yuhui Xu
Christian Herold
Anders Søgaard
Maarten de Rijke
Christof Monz
LRM
206
5
0
27 Jun 2025
1