ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.05229
  4. Cited By
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in
  Large Language Models

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

7 October 2024
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
    AIMat
    LRM
ArXivPDFHTML

Papers citing "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"

50 / 89 papers shown
Title
Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models
Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models
John Hawkins
ReLM
LRM
34
0
0
08 May 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Qi Liu
Xinhao Zheng
Renqiu Xia
Xingzhi Qi
Qinxiang Cao
Junchi Yan
AIMat
40
0
0
07 May 2025
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Wenhao Li
Bo Jin
Mingyi Hong
Changhong Lu
Xiangfeng Wang
36
0
0
07 May 2025
R^3-VQA: "Read the Room" by Video Social Reasoning
R^3-VQA: "Read the Room" by Video Social Reasoning
Lixing Niu
Jiapeng Li
Xingping Yu
Shu Wang
Ruining Feng
Bo Wu
Ping Wei
Y. Wang
Lifeng Fan
36
0
0
07 May 2025
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
Daniel Weitekamp
M. N. Siddiqui
Christopher James Maclellan
LLMAG
ELM
20
0
0
02 May 2025
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Bartosz Piotrowski
Witold Drzewakowski
Konrad Staniszewski
Piotr Miłoś
LRM
22
0
0
23 Apr 2025
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs
Huanyu Zhang
Chengzu Li
Wenshan Wu
Shaoguang Mao
Yan Xia
Ivan Vulić
Z. Zhang
Liang Wang
T. Tan
Furu Wei
LRM
31
1
0
21 Apr 2025
An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework
An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework
Zeyu Wang
Frank P.-W. Lo
Qian Chen
Yongqi Zhang
Chen Lin
Xu Chen
Zhenhua Yu
Alexander J. Thompson
Eric M. Yeatman
Benny P. L. Lo
AI4CE
21
0
0
20 Apr 2025
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Liyi Zhang
Veniamin Veselovsky
R. Thomas McCoy
Thomas L. Griffiths
49
0
0
17 Apr 2025
Sleep-time Compute: Beyond Inference Scaling at Test-time
Sleep-time Compute: Beyond Inference Scaling at Test-time
Kevin Lin
Charlie Snell
Y. Wang
Charles Packer
Sarah Wooders
Ion Stoica
Joseph E. Gonzalez
32
1
0
17 Apr 2025
FLIP Reasoning Challenge
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
69
0
0
16 Apr 2025
Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination
Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination
Mika Setälä
Pieta Sikström
Ville Heilala
T. Karkkainen
ELM
LRM
23
1
0
15 Apr 2025
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Parshin Shojaee
Ngoc-Hieu Nguyen
Kazem Meidani
A. Farimani
Khoa D. Doan
Chandan K. Reddy
21
0
0
14 Apr 2025
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Zaid Khan
Elias Stengel-Eskin
Archiki Prasad
Jaemin Cho
Mohit Bansal
29
0
0
14 Apr 2025
DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning
DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning
Atharva Pandey
Kshitij Dubey
Rahul Sharma
Amit Sharma
ReLM
ELM
LRM
49
0
0
09 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
58
4
0
09 Apr 2025
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning
Rem Yang
Julian Dai
N. Vasilakis
Martin Rinard
ELM
LRM
27
0
0
07 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
22
0
0
04 Apr 2025
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann
Jie-jin Yang
LRM
44
0
0
02 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Z. Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
87
3
0
01 Apr 2025
Medical large language models are easily distracted
Medical large language models are easily distracted
Krithik Vishwanath
Anton Alyakin
Daniel Alber
Jin Vivian Lee
Douglas Kondziolka
E. Oermann
26
0
0
01 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
40
1
0
01 Apr 2025
Do Large Language Models Exhibit Spontaneous Rational Deception?
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
35
0
0
31 Mar 2025
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Jixuan Leng
Chengsong Huang
Langlin Huang
Bill Yuchen Lin
William W. Cohen
Haohan Wang
Jiaxin Huang
LRM
32
0
0
30 Mar 2025
The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction
The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction
Yihuai Hong
Dian Zhou
Meng Cao
Lei Yu
Zhijing Jin
LRM
41
0
0
29 Mar 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
37
0
0
29 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
45
0
0
28 Mar 2025
VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU
VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU
Zhongchun Zheng
Long Cheng
Lu Li
Rodrigo C. O. Rocha
Tianyi Liu
Wei Wei
X. Zhang
Yaoqing Gao
32
0
0
25 Mar 2025
Gemma 3 Technical Report
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
82
24
0
25 Mar 2025
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
Aabid Karim
Abdul Karim
Bhoomika Lohana
Matt Keon
Jaswinder Singh
A. Sattar
42
0
0
23 Mar 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
Giacomo Camposampiero
Michael Hersche
Roger Wattenhofer
A. Sebastian
Abbas Rahimi
LRM
43
1
0
14 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
44
0
0
13 Mar 2025
Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models
Afrar Jahin
Arif Hassan Zidan
Yu Bao
Shizhe Liang
T. Liu
W. Zhang
LRM
61
1
0
13 Mar 2025
Effectiveness of Zero-shot-CoT in Japanese Prompts
Shusuke Takayama
Ian Frank
LRM
39
0
0
09 Mar 2025
Toward an Evaluation Science for Generative AI Systems
Laura Weidinger
Deb Raji
Hanna M. Wallach
Margaret Mitchell
Angelina Wang
Olawale Salaudeen
Rishi Bommasani
Sayash Kapoor
Deep Ganguli
Sanmi Koyejo
EGVM
ELM
62
3
0
07 Mar 2025
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
Junbo Zhao
Ting Zhang
Jiayu Sun
Mi Tian
Hua Huang
34
0
0
07 Mar 2025
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning
Wenjie Wu
Yongcheng Jing
Yingjie Wang
Wenbin Hu
Dacheng Tao
RALM
LRM
56
2
0
03 Mar 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Z. Wang
Yang Zhou
ELM
VLM
LRM
43
1
0
27 Feb 2025
An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs
An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs
Kaustubh Vyas
D. Graux
Sébastien Montella
P. Vougiouklis
Ruofei Lai
Keshuang Li
Yang Ren
Jeff Z. Pan
LLMAG
ELM
63
1
0
27 Feb 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
RALM
LRM
60
0
0
27 Feb 2025
General Reasoning Requires Learning to Reason from the Get-go
General Reasoning Requires Learning to Reason from the Get-go
Seungwook Han
Jyothish Pari
Samuel J. Gershman
Pulkit Agrawal
LRM
61
0
0
26 Feb 2025
BIG-Bench Extra Hard
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
115
4
0
26 Feb 2025
Broadening Discovery through Structural Models: Multimodal Combination of Local and Structural Properties for Predicting Chemical Features
Broadening Discovery through Structural Models: Multimodal Combination of Local and Structural Properties for Predicting Chemical Features
Nikolai Rekut
Alexey Orlov
Klea Ziu
Elizaveta Starykh
Martin Takáč
Aleksandr Beznosikov
49
0
0
25 Feb 2025
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
Patomporn Payoungkhamdee
Pume Tuchinda
Jinheon Baek
Samuel Cahyawijaya
Can Udomcharoenchaikit
Potsawee Manakul
Peerat Limkonchotiwat
E. Chuangsuwanich
Sarana Nutanong
LRM
41
0
0
25 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
36
4
0
24 Feb 2025
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Alon Albalak
Duy Phung
Nathan Lile
Rafael Rafailov
Kanishk Gandhi
...
Anikait Singh
Chase Blagden
Violet Xiang
Dakota Mahan
Nick Haber
OffRL
LRM
45
4
0
24 Feb 2025
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang
Xiang Liu
Qian Wang
Peijie Dong
Bingsheng He
Xiaowen Chu
Bo Li
LRM
44
1
0
24 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
H. Li
Tao Xie
Baishakhi Ray
30
2
0
23 Feb 2025
InductionBench: LLMs Fail in the Simplest Complexity Class
InductionBench: LLMs Fail in the Simplest Complexity Class
Wenyue Hua
Tyler Wong
Sun Fei
Liangming Pan
Adam Jardine
William Yang Wang
LRM
48
2
0
20 Feb 2025
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Daniel J.H. Chung
Zhiqi Gao
Yurii Kvasiuk
Tianyi Li
Moritz Münchmeyer
Maja Rudolph
Frederic Sala
Sai Chaitanya Tadepalli
AIMat
39
3
0
19 Feb 2025
12
Next