ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.03133
  4. Cited By
ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
v1v2 (latest)

ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models

3 July 2025
Boyang Xue
Qi Zhu
Rui Wang
Sheng Wang
Hongru Wang
Minda Hu
Fei Mi
Yasheng Wang
Lifeng Shang
Qun Liu
Kam-Fai Wong
    LRM
ArXiv (abs)PDFHTMLGithub (1245★)

Papers citing "ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models"

3 / 3 papers shown
RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning
RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning
Xinyuan Li
Murong Xu
Wenbiao Tao
Hanlun Zhu
Yike Zhao
Jipeng Zhang
Yunshi Lan
AIMatLRM
289
0
0
06 Nov 2025
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Ivo Petrov
Jasper Dekoninck
Martin Vechev
150
4
0
06 Oct 2025
On the Self-awareness of Large Reasoning Models' Capability Boundaries
On the Self-awareness of Large Reasoning Models' Capability Boundaries
Qingjie Zhang
Y. Fu
Yang Wang
Liu Yan
Tao Wei
Ke Xu
Shiyu Huang
Han Qiu
LRM
193
2
0
29 Sep 2025
1