Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

7 May 2023

Papers citing "Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting"

50 / 57 papers shown

Title
Reasoning Models Don't Always Say What They Think Yanda Chen Joe Benton Ansh Radhakrishnan Jonathan Uesato Carson E. Denison ... Vlad Mikulik Samuel R. Bowman Jan Leike Jared Kaplan E. Perez ReLM LRM 65 7 1 08 May 2025
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration Zirong Chen Ziyan An Jennifer Reynolds Kristin Mullen Stephen Martini Meiyi Ma 27 0 0 06 May 2025
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines R. Manuvinakurike Emanuel Moss E. A. Watkins Saurav Sahay G. Raffa L. Nachman LRM 24 0 0 01 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i Kola Ayonrinde Louis Jaburi MILM 74 1 0 01 May 2025
Phi-4-reasoning Technical Report Marah Abdin Sahaj Agarwal Ahmed Hassan Awadallah Vidhisha Balachandran Harkirat Singh Behl ... Vaishnavi Shrivastava Vibhav Vineet Yue Wu Safoora Yousefi Guoqing Zheng ReLM LRM 77 0 0 30 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Joykirat Singh Raghav Magazine Yash Pandya A. Nambi LLMAG KELM OffRL LRM 52 0 0 28 Apr 2025
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering S. Mohamad Hussein Ann Barcomb Mohammad Moshirpour 45 0 0 27 Apr 2025
Evolution of AI in Education: Agentic Workflows Firuz Kamalov David Santandreu Calonge Linda Smail Dilshod Azizov Dimple R. Thadani Theresa Kwong Amara Atif 43 0 0 25 Apr 2025
Cognitive Debiasing Large Language Models for Decision-Making Yougang Lyu Shijie Ren Yue Feng Zihan Wang Z. Chen Z. Z. Ren Maarten de Rijke 34 0 0 05 Apr 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful Iván Arcuschin Jett Janiak Robert Krzyzanowski Senthooran Rajamanoharan Neel Nanda Arthur Conmy LRM ReLM 62 6 0 11 Mar 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models Panatchakorn Anantaprayoon Masahiro Kaneko Naoaki Okazaki LRM KELM 50 0 0 08 Mar 2025
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models Leena Mathur Marian Qian Paul Pu Liang Louis-Philippe Morency LRM 79 1 0 21 Feb 2025
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs Andreas Opedal Haruki Shirakami Bernhard Schölkopf Abulhair Saparov Mrinmaya Sachan LRM 54 1 0 17 Feb 2025
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning Libo Wang LRM 58 0 0 07 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations Guanxu Chen Dongrui Liu Tao Luo Jing Shao LRM MILM 59 1 0 07 Feb 2025
Evaluating Vision-Language Models as Evaluators in Path Planning Mohamed Aghzal Xiang Yue E. Plaku Ziyu Yao LRM 72 1 0 27 Nov 2024
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning Elita Lobo Chirag Agarwal Himabindu Lakkaraju LRM 70 5 0 22 Nov 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina Yuan Gao Dokyun Lee Gordon Burtch Sina Fazelpour LRM 40 7 0 25 Oct 2024
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps Xiongtao Zhou Jie He Lanyu Chen Jingyu Li Haojing Chen Víctor Gutiérrez-Basulto Jeff Z. Pan H. Chen LRM 49 1 0 18 Oct 2024
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors Georgios Chochlakis Alexandros Potamianos Kristina Lerman Shrikanth Narayanan 25 0 0 17 Oct 2024
FLARE: Faithful Logic-Aided Reasoning and Exploration Erik Arakelyan Pasquale Minervini Pat Verga Patrick Lewis Isabelle Augenstein ReLM LRM 59 2 0 14 Oct 2024
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning Joshua Ong Jun Leang Aryo Pradipta Gema Shay B. Cohen ReLM LRM ReCod 31 2 0 14 Oct 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models Tongxuan Liu Wenjiang Xu Weizhe Huang Yuting Zeng Jiaxing Wang Hailong Yang Hailong Yang Jing Li LRM ReLM 39 5 0 26 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 70 18 0 02 Jul 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step Zezhong Wang Xingshan Zeng Weiwen Liu Yufei Wang Liangyou Li Yasheng Wang Lifeng Shang Xin Jiang Qun Liu Kam-Fai Wong LRM 49 3 0 23 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs Jannik Kossen Jiatong Han Muhammed Razzak Lisa Schut Shreshth A. Malik Yarin Gal HILM 44 33 0 22 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Aman Singh Thakur Kartik Choudhary Venkat Srinik Ramayapally Sankaran Vaidyanathan Dieuwke Hupkes ELM ALM 45 55 0 18 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations Evan Becker Stefano Soatto 32 6 0 05 Jun 2024
Break the Chain: Large Language Models Can be Shortcut Reasoners Mengru Ding Hanmeng Liu Zhizhang Fu Jian Song Wenbo Xie Yue Zhang KELM LRM 27 7 0 04 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap François Roewer-Després Jinyue Feng Zining Zhu Frank Rudzicz LRM 32 0 0 04 Jun 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks Jacob Russin Sam Whitman McGrath Danielle J. Williams Lotem Elber-Dorozko AI4CE 59 2 0 24 May 2024
Securing the Future of GenAI: Policy and Technology Mihai Christodorescu Craven S. Feizi Neil Zhenqiang Gong Mia Hoffmann ... Jessica Newman Emelia Probasco Yanjun Qi Khawaja Shams Turek SILM 26 3 0 21 May 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning Kaya Stechly Karthik Valmeekam Subbarao Kambhampati LRM LM&Ro 54 37 0 08 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? Letitia Parcalabescu Anette Frank MLLM CoGe VLM 79 3 0 29 Apr 2024
Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models Bradley Paul Allen Paul T. Groth 21 3 0 25 Apr 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning Pengyu Cheng Tianhao Hu Han Xu Zhisong Zhang Yong Dai Lei Han Nan Du Nan Du Xiaolong Li SyDa LRM ReLM 87 28 0 16 Apr 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 54 17 0 28 Feb 2024
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting Masahiro Kaneko Danushka Bollegala Naoaki Okazaki Timothy Baldwin LRM 17 27 0 28 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 13 75 0 25 Jan 2024
Evaluating Language Model Agency through Negotiations Tim R. Davidson V. Veselovsky Martin Josifoski Maxime Peyrard Antoine Bosselut Michal Kosinski Robert West LLMAG 24 21 0 09 Jan 2024
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability Simon Lermen Ondvrej Kvapil ELM AAML 11 3 0 26 Nov 2023
Self-Contradictory Reasoning Evaluation and Detection Ziyi Liu Isabelle G. Lee Yongkang Du Soumya Sanyal Jieyu Zhao LRM 28 2 0 16 Nov 2023
Towards Understanding Sycophancy in Language Models Mrinank Sharma Meg Tong Tomasz Korbak D. Duvenaud Amanda Askell ... Oliver Rausch Nicholas Schiefer Da Yan Miranda Zhang Ethan Perez 209 178 0 20 Oct 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations Yanda Chen Ruiqi Zhong Narutatsu Ri Chen Zhao He He Jacob Steinhardt Zhou Yu Kathleen McKeown LRM 13 46 0 17 Jul 2023
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations Wei-Lin Chen Cheng-Kuang Wu Yun-Nung Chen Hsin-Hsi Chen 8 27 0 24 May 2023
Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning Oyvind Tafjord Bhavana Dalvi Peter Clark ReLM KELM LRM 57 52 0 21 Oct 2022
Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model Jacob Eisenstein D. Andor Bernd Bohnet Michael Collins David M. Mimno LRM 181 24 0 05 Oct 2022
Towards Faithful Model Explanation in NLP: A Survey Qing Lyu Marianna Apidianaki Chris Callison-Burch XAI 104 105 0 22 Sep 2022
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango Aman Madaan Amir Yazdanbakhsh LRM 139 115 0 16 Sep 2022
The Alignment Problem from a Deep Learning Perspective Richard Ngo Lawrence Chan Sören Mindermann 29 180 0 30 Aug 2022