Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.18167
Cited By
Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
27 March 2024
Lei Yu
Meng Cao
Jackie Chi Kit Cheung
Yue Dong
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations"
13 / 13 papers shown
Title
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
19
0
0
09 May 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Y. Liu
Jian Kang
Jieyu Zhao
KELM
54
0
0
30 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
60
0
0
27 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
46
1
0
18 Mar 2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
Oscar Obeso
Senthooran Rajamanoharan
Neel Nanda
69
10
0
21 Nov 2024
Mitigating Large Language Model Hallucination with Faithful Finetuning
Minda Hu
Bowei He
Yufei Wang
Liangyou Li
Chen-li Ma
Irwin King
HILM
27
6
0
17 Jun 2024
Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models
Zhuoran Jin
Pengfei Cao
Hongbang Yuan
Yubo Chen
Jiexin Xu
Huaijun Li
Xiaojian Jiang
Kang Liu
Jun Zhao
178
32
0
28 Feb 2024
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
78
246
0
22 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
189
260
0
28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization
Mengyao Cao
Yue Dong
Jackie C.K. Cheung
HILM
170
144
0
30 Aug 2021
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
258
343
0
01 Feb 2021
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
396
2,576
0
03 Sep 2019
1