ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.06453
  4. Cited By
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

10 February 2025
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
Wenzhe Li
Yingqing Guo
Tianle Cai
Hui Yuan
Runzhe Wang
Yue Wu
Ming Yin
Shange Tang
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
    AAML
    LRM
ArXivPDFHTML

Papers citing "MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations"

6 / 6 papers shown
Title
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Zaid Khan
Elias Stengel-Eskin
Archiki Prasad
Jaemin Cho
Mohit Bansal
26
0
0
14 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Z. Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
87
3
0
01 Apr 2025
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Meghana Arakkal Rajeev
Rajkumar Ramamurthy
Prapti Trivedi
Vikas Yadav
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudan
James Y. Zou
Nazneen Rajani
AAML
LRM
37
2
0
03 Mar 2025
À la recherche du sens perdu: your favourite LLM might have more to say than you can understand
K. O. T. Erziev
26
0
0
28 Feb 2025
Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations
Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations
Chunyang Li
Weiqi Wang
Tianshi Zheng
Y. Song
LRM
36
2
0
22 Feb 2025
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
Eva Sánchez Salido
Julio Gonzalo
Guillermo Marco
ELM
48
2
0
18 Feb 2025
1