ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.12289
71
2

Evaluating Step-by-step Reasoning Traces: A Survey

17 February 2025
Jinu Lee
Julia Hockenmaier
    LRM
    ELM
ArXivPDFHTML
Abstract

Step-by-step reasoning is widely used to enhance the reasoning ability of large language models (LLMs) in complex problems. Evaluating the quality of reasoning traces is crucial for understanding and improving LLM reasoning. However, existing evaluation practices are highly inconsistent, resulting in fragmented progress across evaluator design and benchmark development. To address this gap, this survey provides a comprehensive overview of step-by-step reasoning evaluation, proposing a taxonomy of evaluation criteria with four top-level categories (factuality, validity, coherence, and utility). Based on the taxonomy, we review different evaluator implementations and recent findings, leading to promising directions for future research.

View on arXiv
@article{lee2025_2502.12289,
  title={ Evaluating Step-by-step Reasoning Traces: A Survey },
  author={ Jinu Lee and Julia Hockenmaier },
  journal={arXiv preprint arXiv:2502.12289},
  year={ 2025 }
}
Comments on this paper