ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.04213
  4. Cited By
Does Localization Inform Editing? Surprising Differences in
  Causality-Based Localization vs. Knowledge Editing in Language Models

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

10 January 2023
Peter Hase
Mohit Bansal
Been Kim
Asma Ghandeharioun
    MILM
ArXivPDFHTML

Papers citing "Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models"

50 / 143 papers shown
Title
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev
Christian Herold
Baohao Liao
Seyyed Hadi Hashemi
Shahram Khadivi
Christof Monz
MU
85
0
0
09 May 2025
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Vaidehi Patil
Yi-Lin Sung
Peter Hase
Jie Peng
Tianlong Chen
Mohit Bansal
AAML
MU
79
3
0
01 May 2025
Functional Abstraction of Knowledge Recall in Large Language Models
Functional Abstraction of Knowledge Recall in Large Language Models
Zijian Wang
Chang Xu
KELM
32
0
0
20 Apr 2025
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
Tianyang Xu
Xiaoze Liu
Feijie Wu
Xiaoqian Wang
Jing Gao
MU
56
0
0
29 Mar 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao
Jizhan Fang
Jia-Chen Gu
N. Zhang
Shumin Deng
H. Chen
Nanyun Peng
KELM
54
1
0
20 Mar 2025
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin
Jian Xie
Siyu Yuan
Deqing Yang
ReLM
LRM
66
2
0
10 Mar 2025
SAKE: Steering Activations for Knowledge Editing
Marco Scialanga
Thibault Laugel
Vincent Grari
Marcin Detyniecki
KELM
LLMSV
72
1
0
03 Mar 2025
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu
R. Li
Yunzhe Qi
Hui Liu
X. Tang
...
Qingyu Yin
Monica Cheng
Jun Huan
Haoyu Wang
Jing Gao
KELM
43
2
0
01 Mar 2025
A Causal Lens for Evaluating Faithfulness Metrics
A Causal Lens for Evaluating Faithfulness Metrics
Kerem Zaman
Shashank Srivastava
63
0
0
26 Feb 2025
Do Multilingual LLMs Think In English?
Do Multilingual LLMs Think In English?
Lisa Schut
Y. Gal
Sebastian Farquhar
40
3
0
24 Feb 2025
Robust Concept Erasure Using Task Vectors
Robust Concept Erasure Using Task Vectors
Minh Pham
Kelly O. Marshall
Chinmay Hegde
Niv Cohen
115
17
0
21 Feb 2025
Revealing and Mitigating Over-Attention in Knowledge Editing
Revealing and Mitigating Over-Attention in Knowledge Editing
Pinzheng Wang
Zecheng Tang
Keyan Zhou
J. Li
Qiaoming Zhu
M. Zhang
KELM
115
2
0
21 Feb 2025
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Hiba Ahsan
Arnab Sen Sharma
Silvio Amir
David Bau
Byron C. Wallace
80
0
0
20 Feb 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
81
4
0
20 Feb 2025
The Knowledge Microscope: Features as Better Analytical Lenses than Neurons
The Knowledge Microscope: Features as Better Analytical Lenses than Neurons
Yuheng Chen
Pengfei Cao
Kang Liu
Jun Zhao
43
0
0
18 Feb 2025
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Shichang Zhang
Tessa Han
Usha Bhalla
Hima Lakkaraju
FAtt
145
0
0
17 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
X. Wang
Yan Hu
Wenyu Du
Reynold Cheng
Benyou Wang
Difan Zou
51
0
0
17 Feb 2025
Lifelong Sequential Knowledge Editing without Model Degradation
Lifelong Sequential Knowledge Editing without Model Degradation
Akshat Gupta
Phudish Prateepamornkul
Maochuan Lu
Ahmed Alaa
Thomas Hartvigsen
Gopala Anumanchipalli
KELM
70
1
0
03 Feb 2025
Risk-Aware Distributional Intervention Policies for Language Models
Bao Nguyen
Binh Nguyen
Duy Nguyen
V. Nguyen
28
1
0
28 Jan 2025
Making Sense Of Distributed Representations With Activation Spectroscopy
Kyle Reing
Greg Ver Steeg
Aram Galstyan
29
0
0
28 Jan 2025
LLMs as Repositories of Factual Knowledge: Limitations and Solutions
Seyed Mahed Mousavi
Simone Alghisi
Giuseppe Riccardi
KELM
47
0
0
22 Jan 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
62
17
0
08 Jan 2025
Information Anxiety in Large Language Models
Prasoon Bajpai
Sarah Masud
Tanmoy Chakraborty
37
0
0
16 Nov 2024
Towards Unifying Interpretability and Control: Evaluation via Intervention
Towards Unifying Interpretability and Control: Evaluation via Intervention
Usha Bhalla
Suraj Srinivas
Asma Ghandeharioun
Himabindu Lakkaraju
38
5
0
07 Nov 2024
How Transformers Solve Propositional Logic Problems: A Mechanistic
  Analysis
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Guan Zhe Hong
Nishanth Dikkala
Enming Luo
Cyrus Rashtchian
Xin Wang
Rina Panigrahy
OffRL
LRM
NAI
29
0
0
06 Nov 2024
Learning Where to Edit Vision Transformers
Learning Where to Edit Vision Transformers
Yunqiao Yang
Long-Kai Huang
Shengzhuang Chen
Kede Ma
Ying Wei
KELM
28
1
0
04 Nov 2024
Reasons and Solutions for the Decline in Model Performance after Editing
Reasons and Solutions for the Decline in Model Performance after Editing
Xiusheng Huang
Jiaxiang Liu
Yequan Wang
Kang-Jun Liu
KELM
41
4
0
31 Oct 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
Emanuele Marconato
Sébastien Lachapelle
Sebastian Weichwald
Luigi Gresele
64
3
0
30 Oct 2024
Attention Speaks Volumes: Localizing and Mitigating Bias in Language
  Models
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Rishabh Adiga
Besmira Nushi
Varun Chandrasekaran
49
0
0
29 Oct 2024
Learning and Unlearning of Fabricated Knowledge in Language Models
Learning and Unlearning of Fabricated Knowledge in Language Models
Chen Sun
Nolan Miller
A. Zhmoginov
Max Vladymyrov
Mark Sandler
KELM
MU
27
1
0
29 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Denitsa Saynova
Lovisa Hagström
Moa Johansson
Richard Johansson
Marco Kuhlmann
HILM
34
0
0
18 Oct 2024
Hypothesis Testing the Circuit Hypothesis in LLMs
Hypothesis Testing the Circuit Hypothesis in LLMs
Claudia Shi
Nicolas Beltran-Velez
Achille Nazaret
Carolina Zheng
Adrià Garriga-Alonso
Andrew Jesson
Maggie Makar
David M. Blei
37
6
0
16 Oct 2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via
  Mechanistic Localization
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
31
5
0
16 Oct 2024
Mitigating the Language Mismatch and Repetition Issues in LLM-based
  Machine Translation via Model Editing
Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing
Weichuan Wang
Zhaoyi Li
Defu Lian
Chen Ma
Linqi Song
Ying Wei
43
5
0
09 Oct 2024
Activation Scaling for Steering and Interpreting Language Models
Activation Scaling for Steering and Interpreting Language Models
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Ryan Cotterell
Aaron Schein
LLMSV
LRM
29
4
0
07 Oct 2024
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in
  Multimodal Large Language Models
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Kaichen Huang
Jiahao Huo
Yibo Yan
Kun Wang
Yutao Yue
Xuming Hu
31
2
0
07 Oct 2024
Defining Knowledge: Bridging Epistemology and Large Language Models
Defining Knowledge: Bridging Epistemology and Large Language Models
Constanza Fierro
Ruchira Dhar
Filippos Stamatiou
Nicolas Garneau
Anders Søgaard
KELM
23
4
0
03 Oct 2024
Better Call SAUL: Fluent and Consistent Language Model Editing with
  Generation Regularization
Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization
Mingyang Wang
Lukas Lange
Heike Adel
Jannik Strötgen
Hinrich Schütze
KELM
28
2
0
03 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
45
10
0
03 Oct 2024
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
Junfeng Fang
Houcheng Jiang
Kun Wang
Yunshan Ma
Shi Jie
Xiangnan He
Tat-Seng Chua
Tat-seng Chua
KELM
35
33
0
03 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
80
1
0
02 Oct 2024
Optimal ablation for interpretability
Optimal ablation for interpretability
Maximilian Li
Lucas Janson
FAtt
44
2
0
16 Sep 2024
Towards General Industrial Intelligence: A Survey on IIoT-Enhanced
  Continual Large Models
Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models
Jiao Chen
Jiayi He
Fangfang Chen
Zuohong Lv
Jianhua Tang
Weihua Li
Zuozhu Liu
Howard H. Yang
Guangjie Han
AI4CE
34
1
0
02 Sep 2024
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models
Xiyu Liu
Zhengxiao Liu
Naibin Gu
Zheng-Shen Lin
Wanli Ma
Ji Xiang
Weiping Wang
KELM
44
0
0
27 Aug 2024
Artificial intelligence for science: The easy and hard problems
Artificial intelligence for science: The easy and hard problems
Ruairidh M. Battleday
Samuel Gershman
AIMat
18
0
0
24 Aug 2024
Promoting Equality in Large Language Models: Identifying and Mitigating
  the Implicit Bias based on Bayesian Theory
Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory
Yongxin Deng
Xihe Qiu
Xiaoyu Tan
Jing Pan
Chen Jue
Zhijun Fang
Yinghui Xu
Wei Chu
Yuan Qi
26
2
0
20 Aug 2024
Generalisation First, Memorisation Second? Memorisation Localisation for
  Natural Language Classification Tasks
Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks
Verna Dankers
Ivan Titov
35
5
0
09 Aug 2024
Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large
  Language Model-Based Question Answering
Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering
Tiezheng Guo
Chen Wang
Yanyi Liu
Jiawei Tang
Pan Li
Sai Xu
Qingwen Yang
Xianlin Gao
Zhi Li
Yingyou Wen
RALM
27
1
0
06 Aug 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical
  Grounding of Causal Interpretability
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
42
18
0
02 Aug 2024
123
Next