A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models

A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models

21 October 2022

Alessandro Stolfo

Bernhard Schölkopf

Mrinmaya Sachan

Papers citing "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models"

13 / 13 papers shown

Title
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs Andreas Opedal Haruki Shirakami Bernhard Schölkopf Abulhair Saparov Mrinmaya Sachan LRM 54 1 0 17 Feb 2025
Number Cookbook: Number Understanding of Language Models and How to Improve It Haotong Yang Yi Hu Shijia Kang Zhouchen Lin Muhan Zhang LRM 41 2 0 06 Nov 2024
Causal Inference with Large Language Model: A Survey Jing Ma CML LRM 68 8 0 15 Sep 2024
Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions Jordan Meadows Tamsin James André Freitas ReLM LRM AI4CE 31 1 0 29 Apr 2024
Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs Zhenlan Ji Daoyuan Wu Pingchuan Ma Zongjie Li Shuai Wang LLMAG 40 3 0 27 Apr 2024
CLadder: Assessing Causal Reasoning in Language Models Zhijing Jin Yuen Chen Felix Leeb Luigi Gresele Ojasv Kamal ... Kevin Blin Fernando Gonzalez Adauto Max Kleiman-Weiner Mrinmaya Sachan Bernhard Schölkopf ReLM ELM LRM 38 62 0 07 Dec 2023
World Models for Math Story Problems Andreas Opedal Niklas Stoehr Abulhair Saparov Mrinmaya Sachan ReLM 36 12 0 07 Jun 2023
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility Wen-song Ye Mingfeng Ou Tianyi Li Yipeng Chen Xuetao Ma ... Sai Wu Jie Fu Gang Chen Haobo Wang J. Zhao 42 36 0 15 May 2023
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 291 4,048 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,402 0 28 Jan 2022
Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model Kuntal Kumar Pal Chitta Baral 82 18 0 10 Sep 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 248 1,986 0 31 Dec 2020