Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2507.03133
Cited By

ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models

v1v2 (latest)

ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models

3 July 2025

ArXiv (abs)PDF HTML Github (1245★)

Papers citing "ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models"

3 / 3 papers shown

RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning

RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning

289

0

0

06 Nov 2025

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Jasper Dekoninck

150

4

0

06 Oct 2025

On the Self-awareness of Large Reasoning Models' Capability Boundaries

On the Self-awareness of Large Reasoning Models' Capability Boundaries

193

2

0

29 Sep 2025