Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2507.03133
Cited By
v1
v2 (latest)
ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
3 July 2025
Boyang Xue
Qi Zhu
Rui Wang
Sheng Wang
Hongru Wang
Minda Hu
Fei Mi
Yasheng Wang
Lifeng Shang
Qun Liu
Kam-Fai Wong
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1245★)
Papers citing
"ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models"
3 / 3 papers shown
RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning
Xinyuan Li
Murong Xu
Wenbiao Tao
Hanlun Zhu
Yike Zhao
Jipeng Zhang
Yunshi Lan
AIMat
LRM
289
0
0
06 Nov 2025
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Ivo Petrov
Jasper Dekoninck
Martin Vechev
150
4
0
06 Oct 2025
On the Self-awareness of Large Reasoning Models' Capability Boundaries
Qingjie Zhang
Y. Fu
Yang Wang
Liu Yan
Tao Wei
Ke Xu
Shiyu Huang
Han Qiu
LRM
193
2
0
29 Sep 2025
1