ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.04213
  4. Cited By
Does Localization Inform Editing? Surprising Differences in
  Causality-Based Localization vs. Knowledge Editing in Language Models

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

10 January 2023
Peter Hase
Mohit Bansal
Been Kim
Asma Ghandeharioun
    MILM
ArXivPDFHTML

Papers citing "Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models"

50 / 143 papers shown
Title
Can Editing LLMs Inject Harm?
Can Editing LLMs Inject Harm?
Canyu Chen
Baixiang Huang
Zekun Li
Zhaorun Chen
Shiyang Lai
...
Xifeng Yan
William Wang
Philip H. S. Torr
Dawn Song
Kai Shu
KELM
38
11
0
29 Jul 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
32
10
0
27 Jul 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
27
0
22 Jul 2024
How and where does CLIP process negation?
How and where does CLIP process negation?
Vincent Quantmeyer
Pablo Mosteiro
Albert Gatt
CoGe
29
6
0
15 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
80
5
0
09 Jul 2024
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for
  Interpreting Neural Networks
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Aaron Mueller
CML
23
10
0
05 Jul 2024
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures
  Reveal Internal Characteristics of Meta's Llama 2 Model
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model
Brenden Smith
Dallin Baker
Clayton Chase
Myles Barney
Kaden Parker
Makenna Allred
Peter Hu
Alex Evans
Nancy Fulda
22
0
0
04 Jul 2024
The Remarkable Robustness of LLMs: Stages of Inference?
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
Wes Gurnee
Max Tegmark
33
33
0
27 Jun 2024
Do LLMs dream of elephants (when told not to)? Latent concept
  association and associative memory in transformers
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
29
6
0
26 Jun 2024
How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?
How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?
Huaizhi Ge
Frank Rudzicz
Zining Zhu
KELM
43
4
0
25 Jun 2024
Beyond Individual Facts: Investigating Categorical Knowledge Locality of
  Taxonomy and Meronomy Concepts in GPT Models
Beyond Individual Facts: Investigating Categorical Knowledge Locality of Taxonomy and Meronomy Concepts in GPT Models
Christopher Burger
Yifan Hu
Thai Le
KELM
34
0
0
22 Jun 2024
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
Yihuai Hong
Lei Yu
Shauli Ravfogel
Haiqin Yang
Mor Geva
KELM
MU
58
17
0
17 Jun 2024
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Somnath Banerjee
Avik Halder
Rajarshi Mandal
Sayan Layek
Ian Soboroff
Rima Hazra
Animesh Mukherjee
52
0
0
17 Jun 2024
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained
  Language Model for Knowledge Editing and Fine-tuning
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning
Haoyu Wang
Tianci Liu
Ruirui Li
Monica Cheng
Tuo Zhao
Jing Gao
29
7
0
16 Jun 2024
Memorization in deep learning: A survey
Memorization in deep learning: A survey
Jiaheng Wei
Yanjun Zhang
Leo Yu Zhang
Ming Ding
Chao Chen
Kok-Leong Ong
Jun Zhang
Yang Xiang
40
6
0
06 Jun 2024
Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of
  Knowledge Editing in Large Language Models
Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models
Cheng-Hsun Hsueh
Paul Kuo-Ming Huang
Tzu-Han Lin
Che-Wei Liao
Hung-Chieh Fang
Chao-Wei Huang
Yun-Nung Chen
KELM
31
5
0
03 Jun 2024
Position: An Inner Interpretability Framework for AI Inspired by Lessons
  from Cognitive Neuroscience
Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Martina G. Vilas
Federico Adolfi
David Poeppel
Gemma Roig
35
5
0
03 Jun 2024
Knowledge Circuits in Pretrained Transformers
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
64
19
0
28 May 2024
Perturbation-Restrained Sequential Model Editing
Perturbation-Restrained Sequential Model Editing
Junjie Ma
Hong Wang
Haoyang Xu
Zhen-Hua Ling
Jia-Chen Gu
KELM
53
8
0
27 May 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao-quan Song
33
0
0
09 May 2024
What does the Knowledge Neuron Thesis Have to do with Knowledge?
What does the Knowledge Neuron Thesis Have to do with Knowledge?
Jingcheng Niu
Andrew Liu
Zining Zhu
Gerald Penn
36
30
0
03 May 2024
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model
  Editing with Llama-3
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
Junsang Yoon
Akshat Gupta
Gopala Anumanchipalli
23
5
0
01 May 2024
Revealing the Parametric Knowledge of Language Models: A Unified
  Framework for Attribution Methods
Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu
Pepa Atanasova
Isabelle Augenstein
KELM
29
4
0
29 Apr 2024
How to use and interpret activation patching
How to use and interpret activation patching
Stefan Heimersheim
Neel Nanda
17
36
0
23 Apr 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
38
111
0
22 Apr 2024
DyKnow:Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs
DyKnow:Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs
Seyed Mahed Mousavi
Simone Alghisi
Giuseppe Riccardi
KELM
30
5
0
10 Apr 2024
Locating and Editing Factual Associations in Mamba
Locating and Editing Factual Associations in Mamba
Arnab Sen Sharma
David Atkinson
David Bau
KELM
68
28
0
04 Apr 2024
Localizing Paragraph Memorization in Language Models
Localizing Paragraph Memorization in Language Models
Niklas Stoehr
Mitchell Gordon
Chiyuan Zhang
Owen Lewis
MU
38
13
0
28 Mar 2024
The Unreasonable Ineffectiveness of the Deeper Layers
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
41
79
0
26 Mar 2024
Detoxifying Large Language Models via Knowledge Editing
Detoxifying Large Language Models via Knowledge Editing
Meng Wang
Ningyu Zhang
Ziwen Xu
Zekun Xi
Shumin Deng
Yunzhi Yao
Qishen Zhang
Linyi Yang
Jindong Wang
Huajun Chen
KELM
38
54
0
21 Mar 2024
Larimar: Large Language Models with Episodic Memory Control
Larimar: Large Language Models with Episodic Memory Control
Payel Das
Subhajit Chaudhury
Elliot Nelson
Igor Melnyk
Sarath Swaminathan
...
Vijil Chenthamarakshan
Jiří
Jirí Navrátil
Soham Dan
Pin-Yu Chen
CLL
KELM
35
18
0
18 Mar 2024
Monotonic Representation of Numeric Properties in Language Models
Monotonic Representation of Numeric Properties in Language Models
Benjamin Heinzerling
Kentaro Inui
KELM
MILM
40
9
0
15 Mar 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to
  components
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
43
42
0
01 Mar 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language
  Model Representations
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing-ling Huang
Zhengxuan Wu
Christopher Potts
Mor Geva
Atticus Geiger
55
26
0
27 Feb 2024
TruthX: Alleviating Hallucinations by Editing Large Language Models in
  Truthful Space
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
Shaolei Zhang
Tian Yu
Yang Feng
HILM
KELM
29
39
0
27 Feb 2024
InstructEdit: Instruction-based Knowledge Editing for Large Language
  Models
InstructEdit: Instruction-based Knowledge Editing for Large Language Models
Ningyu Zhang
Bo Tian
Siyuan Cheng
Xiaozhuan Liang
Yi Hu
Kouying Xue
Yanjie Gou
Xi Chen
Huajun Chen
KELM
40
4
0
25 Feb 2024
Understanding and Patching Compositional Reasoning in LLMs
Understanding and Patching Compositional Reasoning in LLMs
Zhaoyi Li
Gangwei Jiang
Hong Xie
Linqi Song
Defu Lian
Ying Wei
LRM
46
20
0
22 Feb 2024
Event-level Knowledge Editing
Event-level Knowledge Editing
Hao Peng
Xiaozhi Wang
Chunyang Li
Kaisheng Zeng
Jiangshan Duo
Yixin Cao
Lei Hou
Juanzi Li
KELM
37
5
0
20 Feb 2024
Stable Knowledge Editing in Large Language Models
Stable Knowledge Editing in Large Language Models
Zihao Wei
Liang Pang
Hanxing Ding
Jingcheng Deng
Huawei Shen
Xueqi Cheng
KELM
77
9
0
20 Feb 2024
Learning to Edit: Aligning LLMs with Knowledge Editing
Learning to Edit: Aligning LLMs with Knowledge Editing
Yuxin Jiang
Yufei Wang
Chuhan Wu
Wanjun Zhong
Xingshan Zeng
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Qun Liu
Wei Wang
KELM
27
19
0
19 Feb 2024
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
Shu Yang
Muhammad Asif Ali
Cheng-Long Wang
Lijie Hu
Di Wang
CLL
MoE
32
38
0
17 Feb 2024
Navigating the Dual Facets: A Comprehensive Evaluation of Sequential
  Memory Editing in Large Language Models
Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Zihao Lin
Mohammad Beigi
Hongxuan Li
Yufan Zhou
Yuxiang Zhang
Qifan Wang
Wenpeng Yin
Lifu Huang
KELM
16
8
0
16 Feb 2024
Towards Uncovering How Large Language Model Works: An Explainability
  Perspective
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Mengnan Du
35
10
0
16 Feb 2024
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation
  Tuning with Plausibility Estimation
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation
Zhaowei Wang
Wei Fan
Qing Zong
Hongming Zhang
Sehyun Choi
Tianqing Fang
Xin Liu
Yangqiu Song
Ginny Y. Wong
Simon See
46
13
0
16 Feb 2024
Long-form evaluation of model editing
Long-form evaluation of model editing
Domenic Rosati
Robie Gonzales
Jinkun Chen
Xuemin Yu
Melis Erkan
Yahya Kayani
Satya Deepika Chavatapalli
Frank Rudzicz
Hassan Sajjad
KELM
14
10
0
14 Feb 2024
Rethinking Machine Unlearning for Large Language Models
Rethinking Machine Unlearning for Large Language Models
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
...
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
AILaw
MU
65
81
0
13 Feb 2024
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
Bilal Chughtai
Alan Cooney
Neel Nanda
HILM
KELM
22
16
0
11 Feb 2024
Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language
  Models
Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models
Rima Hazra
Sayan Layek
Somnath Banerjee
Soujanya Poria
KELM
26
17
0
19 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
25
87
0
11 Jan 2024
Model Editing Harms General Abilities of Large Language Models:
  Regularization to the Rescue
Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
Jia-Chen Gu
Haoyang Xu
Jun-Yu Ma
Pan Lu
Zhen-Hua Ling
Kai-Wei Chang
Nanyun Peng
KELM
25
34
0
09 Jan 2024
Previous
123
Next