Mechanistic Understanding and Mitigation of Language Model Non-Factual
Hallucinations

Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations

27 March 2024

Jackie Chi Kit Cheung

Papers citing "Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations"

13 / 13 papers shown

Title
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification Leon Eshuijs Shihan Wang Antske Fokkens 19 0 0 09 May 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base Linxin Song Xuwei Ding Jieyu Zhang Taiwei Shi Ryotaro Shimizu Rahul Gupta Y. Liu Jian Kang Jieyu Zhao KELM 54 0 0 30 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models Chung-En Sun Ge Yan Tsui-Wei Weng KELM LRM 60 0 0 27 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations Ziwei Ji L. Yu Yeskendir Koishekenov Yejin Bang Anthony Hartshorn Alan Schelten Cheng Zhang Pascale Fung Nicola Cancedda 44 1 0 18 Mar 2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando Oscar Obeso Senthooran Rajamanoharan Neel Nanda 67 10 0 21 Nov 2024
Mitigating Large Language Model Hallucination with Faithful Finetuning Minda Hu Bowei He Yufei Wang Liangyou Li Chen-li Ma Irwin King HILM 25 1 0 17 Jun 2024
Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models Zhuoran Jin Pengfei Cao Hongbang Yuan Yubo Chen Jiexin Xu Huaijun Li Xiaojian Jiang Kang Liu Jun Zhao 178 32 0 28 Feb 2024
How Language Model Hallucinations Can Snowball Muru Zhang Ofir Press William Merrill Alisa Liu Noah A. Smith HILM LRM 78 246 0 22 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 260 0 28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization Mengyao Cao Yue Dong Jackie C.K. Cheung HILM 170 144 0 30 Aug 2021
Measuring and Improving Consistency in Pretrained Language Models Yanai Elazar Nora Kassner Shauli Ravfogel Abhilasha Ravichander Eduard H. Hovy Hinrich Schütze Yoav Goldberg HILM 255 273 0 01 Feb 2021
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 393 2,216 0 03 Sep 2019