ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.19229
  4. Cited By
StepWiser: Stepwise Generative Judges for Wiser Reasoning
v1v2 (latest)

StepWiser: Stepwise Generative Judges for Wiser Reasoning

26 August 2025
Wei Xiong
Wenting Zhao
Weizhe Yuan
O. Yu. Golovneva
Tong Zhang
Jason Weston
Sainbayar Sukhbaatar
    LRM
ArXiv (abs)PDFHTMLHuggingFace (17 upvotes)Github (916★)

Papers citing "StepWiser: Stepwise Generative Judges for Wiser Reasoning"

9 / 9 papers shown
Title
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads
Jingwei Ni
Ekaterina Fadeeva
Tianyi Wu
Mubashara Akhtar
Jiaheng Zhang
...
Markus Leippold
Timothy Baldwin
See-Kiong Ng
Artem Shelmanov
Mrinmaya Sachan
LRM
214
0
0
09 Nov 2025
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Austin Xu
Xuan-Phi Nguyen
Yilun Zhou
Chien-Sheng Wu
Caiming Xiong
Shafiq Joty
OffRLALMLRMELM
221
0
0
20 Oct 2025
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang
Weihang Su
Hongtao Tian
Tao Yang
Yujia Zhou
Ting Yao
Qingyao Ai
Yiqun Liu
LRM
89
0
0
13 Oct 2025
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLMLRMAI4CE
274
0
0
02 Oct 2025
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Yiran Shen
Yu Xia
Jonathan D. Chang
Prithviraj Ammanabrolu
148
0
0
01 Oct 2025
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Heyang Gao
Guoqing Liu
Erxue Min
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Xu Chen
144
0
0
26 Sep 2025
ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
Gaole Dai
Shiqi Jiang
Ting Cao
Yuqing Yang
Yuanchun Li
Rui Tan
Mo Li
Lili Qiu
LRM
128
0
0
26 Sep 2025
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Chenlu Ye
Zhou Yu
Ziji Zhang
Hao Chen
Narayanan Sadagopan
Jing-Fu Huang
Tong Zhang
Anurag Beniwal
LRM
132
9
0
03 Sep 2025
Lost at the Beginning of Reasoning
Lost at the Beginning of Reasoning
Baohao Liao
Xinyi Chen
Sara Rajaee
Yuhui Xu
Christian Herold
Anders Søgaard
Maarten de Rijke
Christof Monz
LRM
194
4
0
27 Jun 2025
1