Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.02477
Cited By
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
5 July 2023
Zhaofeng Wu
Linlu Qiu
Alexis Ross
Ekin Akyürek
Boyuan Chen
Bailin Wang
Najoung Kim
Jacob Andreas
Yoon Kim
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks"
37 / 37 papers shown
Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
60
0
0
05 May 2025
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
50
0
0
23 Apr 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
45
1
0
21 Feb 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
...
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
AAML
LRM
74
7
0
10 Feb 2025
Can Large Language Models Understand Intermediate Representations?
Hailong Jiang
Jianfeng Zhu
Yao Wan
B. Fang
Hongyu Zhang
Ruoming Jin
Qiang Guan
48
1
0
07 Feb 2025
Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models
Patrick Y. Wu
Jonathan Nagler
Joshua A. Tucker
Solomon Messing
LRM
39
2
0
28 Jan 2025
Evolution and The Knightian Blindspot of Machine Learning
Joel Lehman
Elliot Meyerson
Tarek El-Gaaly
Kenneth O. Stanley
Tarin Ziyaee
75
1
0
22 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
67
4
0
31 Dec 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
38
7
0
25 Oct 2024
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
İlker Işık
R. G. Cinbis
Ebru Aydin Gol
21
0
0
22 Oct 2024
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
Fangru Lin
Shaoguang Mao
Emanuele La Malfa
Valentin Hofmann
Adrian de Wynter
Jing Yao
Si-Qing Chen
Michael Wooldridge
Furu Wei
Furu Wei
40
2
0
14 Oct 2024
Reasoning Elicitation in Language Models via Counterfactual Feedback
Alihan Hüyük
Xinnuo Xu
Jacqueline Maasch
Aditya V. Nori
Javier González
ReLM
LRM
47
1
0
02 Oct 2024
Counterfactual Token Generation in Large Language Models
Ivi Chatzi
N. C. Benz
Eleni Straitouri
Stratis Tsirtsis
Manuel Gomez Rodriguez
LRM
31
3
0
25 Sep 2024
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
47
6
1
10 Jul 2024
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning
Silin Meng
Yiwei Wang
Cheng-Fu Yang
Nanyun Peng
Kai-Wei Chang
31
16
0
20 Jun 2024
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
Kang-il Lee
Minbeom Kim
Seunghyun Yoon
Minsung Kim
Dongryeol Lee
Hyukhun Koh
Kyomin Jung
CoGe
VLM
67
5
0
13 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
37
10
0
12 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
32
0
0
04 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAG
LRM
ELM
ReLM
52
24
0
04 Jun 2024
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Parshin Shojaee
Kazem Meidani
Shashank Gupta
A. Farimani
Chandan K. Reddy
37
13
0
29 Apr 2024
On the generalization capacity of neural networks during generic multimodal reasoning
Takuya Ito
Soham Dan
Mattia Rigotti
James Kozloski
Murray Campbell
LRM
30
2
0
26 Jan 2024
A Vision Check-up for Language Models
Pratyusha Sharma
Tamar Rott Shaham
Manel Baradad
Stephanie Fu
Adrian Rodriguez-Munoz
Shivam Duggal
Phillip Isola
Antonio Torralba
VLM
LRM
75
8
0
03 Jan 2024
Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge
Genglin Liu
Xingyao Wang
Lifan Yuan
Yangyi Chen
Hao Peng
24
15
0
16 Nov 2023
Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains
Marcio Fonseca
Shay B. Cohen
ALM
11
6
0
15 Nov 2023
GLoRE: Evaluating Logical Reasoning of Large Language Models
Hanmeng Liu
Zhiyang Teng
Ruoxi Ning
Jian Liu
Qiji Zhou
Yuexin Zhang
Yue Zhang
ReLM
ELM
LRM
55
6
0
13 Oct 2023
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
8
51
0
28 Sep 2023
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Jannik Kossen
Y. Gal
Tom Rainforth
12
27
0
23 Jul 2023
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
75
246
0
22 May 2023
Large Linguistic Models: Investigating LLMs' metalinguistic abilities
G. Beguš
M. Dąbkowski
Ryan Rhodes
LRM
29
18
0
01 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,232
0
22 Mar 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
116
270
0
03 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
188
624
0
26 Feb 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
291
0
24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
1