Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

27 March 2025

Papers citing "Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad"

6 / 6 papers shown

Title
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning Liam Boyle Nicolas Baumann Paviththiren Sivasothilingam Michele Magno Luca Benini LM&Ro LRM 37 0 0 06 May 2025
Phi-4-reasoning Technical Report Marah Abdin Sahaj Agarwal Ahmed Hassan Awadallah Vidhisha Balachandran Harkirat Singh Behl ... Vaishnavi Shrivastava Vibhav Vineet Yue Wu Safoora Yousefi Guoqing Zheng ReLM LRM 77 0 0 30 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Shi Qiu Shaoyang Guo Zhuo-Yang Song Y. Sun Zeyu Cai ... Changkun Shao Qing-Hong Cao Ming-xing Luo Muhan Zhang Hua Xing Zhu AIMat LRM 24 0 0 22 Apr 2025
AGI Is Coming... Right After AI Learns to Play Wordle Sarath Shekkizhar Romain Cosentino LLMAG 35 0 0 21 Apr 2025
Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability Jennifer Haase P. Hanel Sebastian Pokutta ALM LRM 60 0 0 10 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility Andreas Hochlehnert Hardik Bhatnagar Vishaal Udandarao Samuel Albanie Ameya Prabhu Matthias Bethge ReLM ALM LRM 66 4 0 09 Apr 2025