ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.18154
  4. Cited By
Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and
  Mitigating Knowledge Conflicts in Language Models

Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models

28 February 2024
Zhuoran Jin
Pengfei Cao
Hongbang Yuan
Yubo Chen
Jiexin Xu
Huaijun Li
Xiaojian Jiang
Kang Liu
Jun Zhao
ArXivPDFHTML

Papers citing "Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models"

10 / 10 papers shown
Title
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment
Sangwon Yu
Jongyoon Song
Bongkyu Hwang
Hoyoung Kang
Sooah Cho
Junhwa Choi
Seongho Joe
Taehee Lee
Youngjune Gwon
Sungroh Yoon
69
4
0
31 Jul 2024
Knowledge Circuits in Pretrained Transformers
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
36
19
0
28 May 2024
Tug-of-War Between Knowledge: Exploring and Resolving Knowledge
  Conflicts in Retrieval-Augmented Language Models
Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models
Zhuoran Jin
Pengfei Cao
Yubo Chen
Kang Liu
Xiaojian Jiang
Jiexin Xu
Qiuxia Li
Jun Zhao
179
10
0
22 Feb 2024
Attention Lens: A Tool for Mechanistically Interpreting the Attention
  Head Information Retrieval Mechanism
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Mansi Sakarvadia
Arham Khan
Aswathy Ajith
Daniel Grzenda
Nathaniel Hudson
André Bauer
Kyle Chard
Ian T. Foster
164
6
0
25 Oct 2023
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large
  Language Models in Knowledge Conflicts
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
Jian Xie
Kai Zhang
Jiangjie Chen
Renze Lou
Yu-Chuan Su
RALM
198
75
0
22 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical
  abilities in a pre-trained language model
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
167
116
0
30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language
  Models
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
180
152
0
28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
205
486
0
01 Nov 2022
Entity-Based Knowledge Conflicts in Question Answering
Entity-Based Knowledge Conflicts in Question Answering
Shayne Longpre
Kartik Perisetla
Anthony Chen
Nikhil Ramesh
Chris DuBois
Sameer Singh
HILM
230
177
0
10 Sep 2021
Consistent Accelerated Inference via Confident Adaptive Transformers
Consistent Accelerated Inference via Confident Adaptive Transformers
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
177
60
0
18 Apr 2021
1