v1v2v3 (latest)

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

North American Chapter of the Association for Computational Linguistics (NAACL), 2023

5 July 2023

Bailin Wang

ArXiv (abs)PDF HTML Github (521★)

Papers citing "Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks"

50 / 185 papers shown

A perceptual bias of AI Logical Argumentation Ability in Writing

27 Nov 2025

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

370

25 Nov 2025

DEVAL: A Framework for Evaluating and Improving the Derivation Capability of Large Language Models

262

18 Nov 2025

How Well Do LLMs Understand Drug Mechanisms? A Knowledge + Reasoning Evaluation Dataset

Sunil Mohan

Theofanis Karaletsos

151

09 Nov 2025

Next-Latent Prediction Transformers Learn Compact World Models

214

08 Nov 2025

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Md Tanvirul Alam

Nidhi Rastogi

OffRL LRM

153

30 Oct 2025

SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models

303

28 Oct 2025

Code-enabled language models can outperform reasoning models on diverse tasks

244

23 Oct 2025

The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts

172

23 Oct 2025

Doing Things with Words: Rethinking Theory of Mind Simulation in Large Language Models

A. Lombardi

Alessandro Lenci

LLMAG

165

15 Oct 2025

Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models

181

13 Oct 2025

A Survey of Inductive Reasoning for Large Language Models

...

232

11 Oct 2025

CARPAS: Towards Content-Aware Refinement of Provided Aspects for Summarization in Large Language Models

153

08 Oct 2025

PoseGaze-AHP: A Knowledge-Based 3D Dataset for AI-Driven Ocular and Postural Diagnosis

102

04 Oct 2025

Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code

213

02 Oct 2025

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

...

186

30 Sep 2025

Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions

121

29 Sep 2025

Review of Hallucination Understanding in Large Language and Vision Models

185

26 Sep 2025

Who's Laughing Now? An Overview of Computational Humour Generation and Explanation

199

25 Sep 2025

Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity

234

23 Sep 2025

Robustness of Neurosymbolic Reasoners on First-Order Logic Problems

174

22 Sep 2025

Statistical Methods in Generative AI

Edgar Dobriban

336

08 Sep 2025

The Need for Verification in AI-Driven Scientific Discovery

237

01 Sep 2025

Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

...

155

24 Aug 2025

Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning

122

15 Aug 2025

Grounding Natural Language for Multi-agent Decision-Making with Multi-agentic LLMs

Dom Huh

P. Mohapatra

LLMAG LM&Ro

101

10 Aug 2025

Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time

325

04 Aug 2025

Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data

Sohaib Imran

Rob Lamb

Peter M. Atkinson

197

01 Aug 2025

Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs

241

29 Jul 2025

How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation under the One-Time-Pad-Based Framework

307

25 Jul 2025

Adaptive Multi-Agent Reasoning via Automated Workflow Generation

152

18 Jul 2025

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

245

14 Jul 2025

Measuring Intent Comprehension in LLMs

Nadav Kunievsky

James A. Evans

243

19 Jun 2025

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation

326

18 Jun 2025

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

286

13 Jun 2025

BF-Max: an Efficient Bit Flipping Decoder with Predictable Decoding Failure RateInternational Symposium on Information Theory (ISIT), 2025

429

11 Jun 2025

DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?

287

30 May 2025

Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors

Harish Tayyar Madabushi

Melissa Torgbi

C. Bonial

460

29 May 2025

Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds

Ishwar B Balappanawar

Vamshi Krishna Bonagiri

Anish Joishy

Manas Gaur

K. Thirunarayan

Ponnurangam Kumaraguru

ReLM LRM

329

28 May 2025

Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective

274

28 May 2025

Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Ziling Cheng

Meng Cao

Marc-Antoine Rondeau

Jackie Chi Kit Cheung

LRM

378

28 May 2025

Two Causally Related Needles in a Video Haystack

363

26 May 2025

Recalibrating the Compass: Integrating Large Language Models into Classical Research Methods

Tai-Quan Peng

Xuzhen Yang

334

26 May 2025

Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery

...

471

22 May 2025

Causal Cartographer: From Mapping to Reasoning Over Counterfactual Worlds

291

20 May 2025

Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning

342

19 May 2025

A Minimum Description Length Approach to Regularization in Neural Networks

340

19 May 2025

Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding

553

18 May 2025

Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis

364

16 May 2025

Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards

Xiaobao Wu

LRM

776

05 May 2025