ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.04213
  4. Cited By
Does Localization Inform Editing? Surprising Differences in
  Causality-Based Localization vs. Knowledge Editing in Language Models
v1v2 (latest)

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Neural Information Processing Systems (NeurIPS), 2023
10 January 2023
Peter Hase
Joey Tianyi Zhou
Been Kim
Asma Ghandeharioun
    MILM
ArXiv (abs)PDFHTMLGithub (61★)

Papers citing "Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models"

34 / 84 papers shown
Title
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
302
23
0
27 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELMMU
429
4
0
09 Jul 2024
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for
  Interpreting Neural Networks
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Aaron Mueller
CML
188
15
0
05 Jul 2024
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures
  Reveal Internal Characteristics of Meta's Llama 2 Model
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model
Brenden Smith
Dallin Baker
Clayton Chase
Myles Barney
Kaden Parker
Makenna Allred
Peter Hu
Alex Evans
Nancy Fulda
179
0
0
04 Jul 2024
The Remarkable Robustness of LLMs: Stages of Inference?
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
Wes Gurnee
Max Tegmark
Max Tegmark
414
80
0
27 Jun 2024
How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?
How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?
Huaizhi Ge
Frank Rudzicz
Zining Zhu
KELM
241
4
0
25 Jun 2024
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Somnath Banerjee
Avik Halder
Rajarshi Mandal
Sayan Layek
Ian Soboroff
Rima Hazra
Animesh Mukherjee
452
2
0
17 Jun 2024
Knowledge Circuits in Pretrained Transformers
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
321
41
0
28 May 2024
Perturbation-Restrained Sequential Model Editing
Perturbation-Restrained Sequential Model Editing
Junjie Ma
Hong Wang
Haoyang Xu
Zhen-Hua Ling
Jia-Chen Gu
KELM
450
15
0
27 May 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
320
1
0
09 May 2024
Revealing the Parametric Knowledge of Language Models: A Unified
  Framework for Attribution Methods
Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu
Pepa Atanasova
Isabelle Augenstein
KELM
187
10
0
29 Apr 2024
How to use and interpret activation patching
How to use and interpret activation patching
Stefan Heimersheim
Neel Nanda
188
92
0
23 Apr 2024
Decomposing and Editing Predictions by Modeling Model Computation
Decomposing and Editing Predictions by Modeling Model Computation
Harshay Shah
Andrew Ilyas
Aleksander Madry
KELM
250
22
0
17 Apr 2024
The Unreasonable Ineffectiveness of the Deeper Layers
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
380
152
0
26 Mar 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language
  Model Representations
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing-ling Huang
Zhengxuan Wu
Christopher Potts
Mor Geva
Atticus Geiger
252
53
0
27 Feb 2024
Stable Knowledge Editing in Large Language Models
Stable Knowledge Editing in Large Language Models
Zihao Wei
Liang Pang
Hanxing Ding
Jingcheng Deng
Huawei Shen
Xueqi Cheng
KELM
163
13
0
20 Feb 2024
Learning to Edit: Aligning LLMs with Knowledge Editing
Learning to Edit: Aligning LLMs with Knowledge Editing
Yuxin Jiang
Yufei Wang
Chuhan Wu
Wanjun Zhong
Xingshan Zeng
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Qun Liu
Wei Wang
KELM
196
43
0
19 Feb 2024
Navigating the Dual Facets: A Comprehensive Evaluation of Sequential
  Memory Editing in Large Language Models
Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Zihao Lin
Mohammad Beigi
Hongxuan Li
Jiuxiang Gu
Yuxiang Zhang
Qifan Wang
Wenpeng Yin
Lifu Huang
KELM
130
10
0
16 Feb 2024
Towards Uncovering How Large Language Model Works: An Explainability
  Perspective
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Jundong Li
274
23
0
16 Feb 2024
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation
  Tuning with Plausibility Estimation
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation
Zhaowei Wang
Wei Fan
Qing Zong
Hongming Zhang
Sehyun Choi
Tianqing Fang
Xin Liu
Yangqiu Song
Ginny Wong
Simon See
245
17
0
16 Feb 2024
Long-form evaluation of model editing
Long-form evaluation of model editing
Domenic Rosati
Robie Gonzales
Jinkun Chen
Xuemin Yu
Melis Erkan
Yahya Kayani
Satya Deepika Chavatapalli
Frank Rudzicz
Hassan Sajjad
KELM
188
16
0
14 Feb 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language ModelsInternational Conference on Machine Learning (ICML), 2024
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
555
152
0
11 Jan 2024
Model Editing Harms General Abilities of Large Language Models:
  Regularization to the Rescue
Model Editing Harms General Abilities of Large Language Models: Regularization to the RescueConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jia-Chen Gu
Haoyang Xu
Jun-Yu Ma
Pan Lu
Zhen-Hua Ling
Kai-Wei Chang
Nanyun Peng
KELM
335
78
0
09 Jan 2024
The Truth is in There: Improving Reasoning in Language Models with
  Layer-Selective Rank Reduction
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Pratyusha Sharma
Jordan T. Ash
Dipendra Kumar Misra
LRM
192
111
0
21 Dec 2023
Neuron-Level Knowledge Attribution in Large Language Models
Neuron-Level Knowledge Attribution in Large Language Models
Zeping Yu
Sophia Ananiadou
FAttKELM
230
27
0
19 Dec 2023
Identifying and Adapting Transformer-Components Responsible for Gender
  Bias in an English Language Model
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language ModelBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
223
19
0
19 Oct 2023
How Do Large Language Models Capture the Ever-changing World Knowledge?
  A Review of Recent Advances
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent AdvancesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zihan Zhang
Meng Fang
Lingxi Chen
Mohammad-Reza Namazi-Rad
Jun Wang
KELM
206
38
0
11 Oct 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and MethodsInternational Conference on Learning Representations (ICLR), 2023
Fred Zhang
Neel Nanda
LLMSV
440
165
0
27 Sep 2023
Journey to the Center of the Knowledge Neurons: Discoveries of
  Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge NeuronsAAAI Conference on Artificial Intelligence (AAAI), 2023
Yuheng Chen
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
261
58
0
25 Aug 2023
Linearity of Relation Decoding in Transformer Language Models
Linearity of Relation Decoding in Transformer Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
291
130
0
17 Aug 2023
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into
  Machine Learning Pipelines
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning PipelinesConference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2023
Matthew Barker
Emma Kallina
D. Ashok
Katherine M. Collins
Ashley Casovan
Adrian Weller
Ameet Talwalkar
Valerie Chen
Umang Bhatt
140
10
0
28 Jul 2023
Editing Large Language Models: Problems, Methods, and Opportunities
Editing Large Language Models: Problems, Methods, and OpportunitiesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yunzhi Yao
Peng Wang
Bo Tian
Shuyang Cheng
Zhoubo Li
Shumin Deng
Huajun Chen
Ningyu Zhang
KELM
268
387
0
22 May 2023
Task-Specific Skill Localization in Fine-tuned Language Models
Task-Specific Skill Localization in Fine-tuned Language ModelsInternational Conference on Machine Learning (ICML), 2023
A. Panigrahi
Nikunj Saunshi
Haoyu Zhao
Sanjeev Arora
MoMe
263
89
0
13 Feb 2023
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value
  Adaptors
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value AdaptorsNeural Information Processing Systems (NeurIPS), 2022
Thomas Hartvigsen
S. Sankaranarayanan
Hamid Palangi
Yoon Kim
Marzyeh Ghassemi
KELM
538
231
0
20 Nov 2022
Previous
12