ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.02477
  4. Cited By
Reasoning or Reciting? Exploring the Capabilities and Limitations of
  Language Models Through Counterfactual Tasks
v1v2v3 (latest)

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

North American Chapter of the Association for Computational Linguistics (NAACL), 2023
5 July 2023
Zhaofeng Wu
Linlu Qiu
Alexis Ross
Ekin Akyürek
Boyuan Chen
Bailin Wang
Najoung Kim
Jacob Andreas
Yoon Kim
    LRMReLM
ArXiv (abs)PDFHTMLGithub (521★)

Papers citing "Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks"

50 / 185 papers shown
A perceptual bias of AI Logical Argumentation Ability in Writing
A perceptual bias of AI Logical Argumentation Ability in Writing
Xi Cun
Jifan Ren
Asha Huang
Siyu Li
Ruzhen Song
86
0
0
27 Nov 2025
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
Md Tanvirul Alam
Saksham Aggarwal
Justin Yang Chae
Nidhi Rastogi
ObjDReLMLRM
370
0
0
25 Nov 2025
DEVAL: A Framework for Evaluating and Improving the Derivation Capability of Large Language Models
DEVAL: A Framework for Evaluating and Improving the Derivation Capability of Large Language Models
Y. Li
Qin Li
Min Zhang
Min Zhang
LRM
262
0
0
18 Nov 2025
How Well Do LLMs Understand Drug Mechanisms? A Knowledge + Reasoning Evaluation Dataset
How Well Do LLMs Understand Drug Mechanisms? A Knowledge + Reasoning Evaluation Dataset
Sunil Mohan
Theofanis Karaletsos
151
1
0
09 Nov 2025
Next-Latent Prediction Transformers Learn Compact World Models
Next-Latent Prediction Transformers Learn Compact World Models
Jayden Teoh
Manan Tomar
Kwangjun Ahn
E. Hu
Pratyusha Sharma
Riashat Islam
Alex Lamb
John Langford
214
3
0
08 Nov 2025
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning
Md Tanvirul Alam
Nidhi Rastogi
OffRLLRM
153
3
0
30 Oct 2025
SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
Ken Gu
Advait Bhat
Mike A. Merrill
Robert West
Xin Liu
Daniel J. McDuff
Tim Althoff
KELMLRM
303
2
0
28 Oct 2025
Code-enabled language models can outperform reasoning models on diverse tasks
Code-enabled language models can outperform reasoning models on diverse tasks
Cedegao E. Zhang
Cédric Colas
Gabriel Poesia
Joshua B. Tenenbaum
Jacob Andreas
ReLMALMLRMAI4CE
244
3
0
23 Oct 2025
The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts
The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts
Sangmitra Madhusudan
Kaige Chen
Ali Emami
ELMLRM
172
0
0
23 Oct 2025
Doing Things with Words: Rethinking Theory of Mind Simulation in Large Language Models
Doing Things with Words: Rethinking Theory of Mind Simulation in Large Language Models
A. Lombardi
Alessandro Lenci
LLMAG
165
1
0
15 Oct 2025
Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models
Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models
Samuel Lippl
Thomas McGee
Kimberly Lopez
Ziwen Pan
Pierce Zhang
Salma Ziadi
Oliver Eberle
Ida Momennejad
LRM
181
0
0
13 Oct 2025
A Survey of Inductive Reasoning for Large Language Models
A Survey of Inductive Reasoning for Large Language Models
Kedi Chen
Dezhao Ruan
Yuhao Dan
Y. Wang
Siyu Yan
...
Biqing Qi
Linyang Li
Qipeng Guo
Xiaoming Shi
Wei-na Zhang
ReLMLRMELMAI4CE
232
4
0
11 Oct 2025
CARPAS: Towards Content-Aware Refinement of Provided Aspects for Summarization in Large Language Models
CARPAS: Towards Content-Aware Refinement of Provided Aspects for Summarization in Large Language Models
Yong-En Tian
Yu-Chien Tang
An-Zi Yen
Wen-Chih Peng
153
7
0
08 Oct 2025
PoseGaze-AHP: A Knowledge-Based 3D Dataset for AI-Driven Ocular and Postural Diagnosis
PoseGaze-AHP: A Knowledge-Based 3D Dataset for AI-Driven Ocular and Postural Diagnosis
Saja Al-Dabet
Sherzod Turaev
Nazar Zaki
Arif O. Khan
Luai Eldweik
102
1
0
04 Oct 2025
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
Aniket Vashishtha
Qirun Dai
Hongyuan Mei
Amit Sharma
Chenhao Tan
Hao Peng
LRM
213
0
0
02 Oct 2025
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Minhui Zhu
Minyang Tian
Xiaocheng Yang
Tianci Zhou
Lifan Yuan
...
Ruixing Zhang
X. Wang
Ofir Press
Nicolas Chia
Eliu A. Huerta
LRMELM
186
5
0
30 Sep 2025
Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
Luisa Geiger
Mareike Hartmann
Michael Sullivan
Alexander Koller
121
0
0
29 Sep 2025
Review of Hallucination Understanding in Large Language and Vision Models
Review of Hallucination Understanding in Large Language and Vision Models
Zhengyi Ho
Siyuan Liang
D. Tao
VLMLRM
185
2
0
26 Sep 2025
Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
Tyler Loakman
William Thorne
Chenghua Lin
LRM
199
4
0
25 Sep 2025
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
Yeongbin Seo
Gayoung Kim
Jaehyung Kim
Jinyoung Yeo
234
0
0
23 Sep 2025
Robustness of Neurosymbolic Reasoners on First-Order Logic Problems
Robustness of Neurosymbolic Reasoners on First-Order Logic Problems
Hannah Bansal
Kemal Kurniawan
Lea Frermann
OffRLLRM
174
0
0
22 Sep 2025
Statistical Methods in Generative AI
Statistical Methods in Generative AI
Edgar Dobriban
336
3
0
08 Sep 2025
The Need for Verification in AI-Driven Scientific Discovery
The Need for Verification in AI-Driven Scientific Discovery
Cristina Cornelio
Takuya Ito
Ryan Cory-Wright
S. Dash
L. Horesh
237
4
0
01 Sep 2025
Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Jiaqi Liu
Songning Lai
P. Li
Di Yu
Wenjie Zhou
...
L. Bai
Xuming He
Mingyu Ding
Huaxiu Yao
Aoran Wang
155
5
0
24 Aug 2025
Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
Sylvio Rüdian
Yassin Elsir
Marvin Kretschmer
Sabine Cayrou
Niels Pinkwart
122
0
0
15 Aug 2025
Grounding Natural Language for Multi-agent Decision-Making with Multi-agentic LLMs
Grounding Natural Language for Multi-agent Decision-Making with Multi-agentic LLMs
Dom Huh
P. Mohapatra
LLMAGLM&Ro
101
0
0
10 Aug 2025
Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
Huihan Li
You Chen
Siyuan Wang
Yixin He
Ninareh Mehrabi
Rahul Gupta
Xiang Ren
LRM
325
4
0
04 Aug 2025
Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data
Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data
Sohaib Imran
Rob Lamb
Peter M. Atkinson
197
1
0
01 Aug 2025
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
Qinyuan Wu
Soumi Das
Mahsa Amani
Bishwamittra Ghosh
Mohammad Aflah Khan
Krishna P. Gummadi
Muhammad Bilal Zafar
241
3
0
29 Jul 2025
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation under the One-Time-Pad-Based Framework
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation under the One-Time-Pad-Based Framework
Zi Liang
Liantong Yu
Shiyu Zhang
Qingqing Ye
Haibo Hu
ELM
307
2
0
25 Jul 2025
Adaptive Multi-Agent Reasoning via Automated Workflow Generation
Adaptive Multi-Agent Reasoning via Automated Workflow Generation
Humza Sami
Mubashir ul Islam
P. Gaillardon
V. Tenace
LLMAGLRM
152
1
0
18 Jul 2025
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Qinyuan Ye
Robin Jia
Xiang Ren
LRMELM
245
2
0
14 Jul 2025
Measuring Intent Comprehension in LLMs
Measuring Intent Comprehension in LLMs
Nadav Kunievsky
James A. Evans
243
1
0
19 Jun 2025
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
Xinnuo Xu
Rachel Lawrence
Kshitij Dubey
Atharva Pandey
Risa Ueno
Fabian Falck
A. Nori
Rahul Sharma
Amit Sharma
Javier González
LRM
326
6
0
18 Jun 2025
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Dongwei Jiang
Alvin Zhang
Andrew Wang
Nicholas Andrews
Daniel Khashabi
LRM
286
4
0
13 Jun 2025
BF-Max: an Efficient Bit Flipping Decoder with Predictable Decoding Failure Rate
BF-Max: an Efficient Bit Flipping Decoder with Predictable Decoding Failure RateInternational Symposium on Information Theory (ISIT), 2025
Alessio Baldelli
Marco Baldi
F. Chiaraluce
Paolo Santini
429
2
0
11 Jun 2025
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Tianhong Zhou
Yin Xu
Yingtao Zhu
Chuxi Xiao
Haiyang Bian
Lei Wei
Xuegong Zhang
LM&MAVLM
287
6
0
30 May 2025
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Harish Tayyar Madabushi
Melissa Torgbi
C. Bonial
460
4
0
29 May 2025
Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds
Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds
Ishwar B Balappanawar
Vamshi Krishna Bonagiri
Anish Joishy
Manas Gaur
K. Thirunarayan
Ponnurangam Kumaraguru
ReLMLRM
329
0
0
28 May 2025
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Qingchuan Ma
Yuhang Wu
Xiawu Zheng
Rongrong Ji
274
2
0
28 May 2025
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ziling Cheng
Meng Cao
Marc-Antoine Rondeau
Jackie Chi Kit Cheung
LRM
378
3
0
28 May 2025
Two Causally Related Needles in a Video Haystack
Two Causally Related Needles in a Video Haystack
Miaoyu Li
Qin Chao
Boyang Albert Li
CML
363
0
0
26 May 2025
Recalibrating the Compass: Integrating Large Language Models into Classical Research Methods
Recalibrating the Compass: Integrating Large Language Models into Classical Research Methods
Tai-Quan Peng
Xuzhen Yang
334
3
0
26 May 2025
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Yanbo Zhang
S. Khan
Adnan Mahmud
Huck Yang
Alexander Lavin
...
James A. Evans
Alan R. Bundy
Jannis Brugger
Jesper Tegner
Hector Zenil
LM&MA
471
7
0
22 May 2025
Causal Cartographer: From Mapping to Reasoning Over Counterfactual Worlds
Causal Cartographer: From Mapping to Reasoning Over Counterfactual Worlds
Gaël Gendron
Jože M. Rožanec
Michael Witbrock
Gillian Dobbie
CML
291
2
0
20 May 2025
Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning
Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning
Adam Štorek
Mukur Gupta
Samira Hajizadeh
Prashast Srivastava
Suman Jana
LRM
342
3
0
19 May 2025
A Minimum Description Length Approach to Regularization in Neural Networks
A Minimum Description Length Approach to Regularization in Neural Networks
Matan Abudy
Orr Well
Emmanuel Chemla
Roni Katzir
Nur Lan
340
2
0
19 May 2025
Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding
Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding
Siyang Wu
Honglin Bao
Nadav Kunievsky
James A. Evans
553
0
0
18 May 2025
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis
Akarsh Kumar
Jeff Clune
Joel Lehman
Kenneth O. Stanley
OOD
364
18
0
16 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
776
5
0
05 May 2025
1234
Next
Page 1 of 4