Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.04388
Cited By
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
7 May 2023
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting"
50 / 57 papers shown
Title
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLM
LRM
65
7
1
08 May 2025
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration
Zirong Chen
Ziyan An
Jennifer Reynolds
Kristin Mullen
Stephen Martini
Meiyi Ma
27
0
0
06 May 2025
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
R. Manuvinakurike
Emanuel Moss
E. A. Watkins
Saurav Sahay
G. Raffa
L. Nachman
LRM
24
0
0
01 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
74
1
0
01 May 2025
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
77
0
0
30 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
52
0
0
28 Apr 2025
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering
S.
Mohamad Hussein
Ann Barcomb
Mohammad Moshirpour
45
0
0
27 Apr 2025
Evolution of AI in Education: Agentic Workflows
Firuz Kamalov
David Santandreu Calonge
Linda Smail
Dilshod Azizov
Dimple R. Thadani
Theresa Kwong
Amara Atif
43
0
0
25 Apr 2025
Cognitive Debiasing Large Language Models for Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Z. Chen
Z. Z. Ren
Maarten de Rijke
34
0
0
05 Apr 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
LRM
ReLM
62
6
0
11 Mar 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
Panatchakorn Anantaprayoon
Masahiro Kaneko
Naoaki Okazaki
LRM
KELM
50
0
0
08 Mar 2025
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
Leena Mathur
Marian Qian
Paul Pu Liang
Louis-Philippe Morency
LRM
79
1
0
21 Feb 2025
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
Andreas Opedal
Haruki Shirakami
Bernhard Schölkopf
Abulhair Saparov
Mrinmaya Sachan
LRM
54
1
0
17 Feb 2025
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
Libo Wang
LRM
58
0
0
07 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
59
1
0
07 Feb 2025
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal
Xiang Yue
E. Plaku
Ziyu Yao
LRM
72
1
0
27 Nov 2024
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
Elita Lobo
Chirag Agarwal
Himabindu Lakkaraju
LRM
70
5
0
22 Nov 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
40
7
0
25 Oct 2024
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
Xiongtao Zhou
Jie He
Lanyu Chen
Jingyu Li
Haojing Chen
Víctor Gutiérrez-Basulto
Jeff Z. Pan
H. Chen
LRM
49
1
0
18 Oct 2024
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
Georgios Chochlakis
Alexandros Potamianos
Kristina Lerman
Shrikanth Narayanan
25
0
0
17 Oct 2024
FLARE: Faithful Logic-Aided Reasoning and Exploration
Erik Arakelyan
Pasquale Minervini
Pat Verga
Patrick Lewis
Isabelle Augenstein
ReLM
LRM
59
2
0
14 Oct 2024
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning
Joshua Ong Jun Leang
Aryo Pradipta Gema
Shay B. Cohen
ReLM
LRM
ReCod
31
2
0
14 Oct 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Tongxuan Liu
Wenjiang Xu
Weizhe Huang
Yuting Zeng
Jiaxing Wang
Hailong Yang
Hailong Yang
Jing Li
LRM
ReLM
39
5
0
26 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
70
18
0
02 Jul 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yufei Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
49
3
0
23 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jannik Kossen
Jiatong Han
Muhammed Razzak
Lisa Schut
Shreshth A. Malik
Yarin Gal
HILM
44
33
0
22 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Evan Becker
Stefano Soatto
32
6
0
05 Jun 2024
Break the Chain: Large Language Models Can be Shortcut Reasoners
Mengru Ding
Hanmeng Liu
Zhizhang Fu
Jian Song
Wenbo Xie
Yue Zhang
KELM
LRM
27
7
0
04 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
32
0
0
04 Jun 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
59
2
0
24 May 2024
Securing the Future of GenAI: Policy and Technology
Mihai Christodorescu
Craven
S. Feizi
Neil Zhenqiang Gong
Mia Hoffmann
...
Jessica Newman
Emelia Probasco
Yanjun Qi
Khawaja Shams
Turek
SILM
26
3
0
21 May 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning
Kaya Stechly
Karthik Valmeekam
Subbarao Kambhampati
LRM
LM&Ro
54
37
0
08 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
79
3
0
29 Apr 2024
Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models
Bradley Paul Allen
Paul T. Groth
21
3
0
25 Apr 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDa
LRM
ReLM
87
28
0
16 Apr 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
54
17
0
28 Feb 2024
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
Timothy Baldwin
LRM
17
27
0
28 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
75
0
25 Jan 2024
Evaluating Language Model Agency through Negotiations
Tim R. Davidson
V. Veselovsky
Martin Josifoski
Maxime Peyrard
Antoine Bosselut
Michal Kosinski
Robert West
LLMAG
24
21
0
09 Jan 2024
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
Ondvrej Kvapil
ELM
AAML
11
3
0
26 Nov 2023
Self-Contradictory Reasoning Evaluation and Detection
Ziyi Liu
Isabelle G. Lee
Yongkang Du
Soumya Sanyal
Jieyu Zhao
LRM
28
2
0
16 Nov 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
209
178
0
20 Oct 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
13
46
0
17 Jul 2023
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations
Wei-Lin Chen
Cheng-Kuang Wu
Yun-Nung Chen
Hsin-Hsi Chen
8
27
0
24 May 2023
Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
Oyvind Tafjord
Bhavana Dalvi
Peter Clark
ReLM
KELM
LRM
57
52
0
21 Oct 2022
Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Jacob Eisenstein
D. Andor
Bernd Bohnet
Michael Collins
David M. Mimno
LRM
181
24
0
05 Oct 2022
Towards Faithful Model Explanation in NLP: A Survey
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
XAI
104
105
0
22 Sep 2022
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
Aman Madaan
Amir Yazdanbakhsh
LRM
139
115
0
16 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
29
180
0
30 Aug 2022
1
2
Next