v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

Neural Information Processing Systems (NeurIPS), 2022

10 February 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,361 papers shown

Optimal ablation for interpretabilityNeural Information Processing Systems (NeurIPS), 2024

Maximilian Li

Lucas Janson

FAtt

343

16 Sep 2024

Causal Inference with Large Language Model: A SurveyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Jing Ma

CML LRM

584

15 Sep 2024

Prevailing Research Areas for Music AI in the Era of Foundation Models

428

14 Sep 2024

Synthetic continued pretrainingInternational Conference on Learning Representations (ICLR), 2024

349

11 Sep 2024

Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts

Wieland Brendel

256

09 Sep 2024

OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System

Bozhong Tian

...

Lei Liang

Qing Cui

Xiaowei Zhu

Jun Zhou

Huajun Chen

KELM

199

09 Sep 2024

Representational Analysis of Binding in Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Qin Dai

Benjamin Heinzerling

Kentaro Inui

313

09 Sep 2024

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Maheep Chaudhary

Atticus Geiger

248

05 Sep 2024

Attend First, Consolidate Later: On the Importance of Attention in Different LLM LayersBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024

Amit Ben Artzy

Roy Schwartz

164

05 Sep 2024

Interpreting and Improving Large Language Models in Arithmetic CalculationInternational Conference on Machine Learning (ICML), 2024

Wei Zhang

Chaoqun Wan

Yonggang Zhang

Yiu-ming Cheung

Xinmei Tian

Xu Shen

Jieping Ye

LRM

323

03 Sep 2024

Does Knowledge Localization Hold True? Surprising Differences Between Entity and Relation Perspectives in Language ModelsInternational Conference on Information and Knowledge Management (CIKM), 2024

Yifan Wei

Xiaoyan Yu

Yixuan Weng

Huanhuan Ma

Yuanzhe Zhang

Jun Zhao

Kang Liu

KELM

202

01 Sep 2024

Modularity in Transformers: Investigating Neuron Separability & Specialization

Nicholas Pochinkov

Thomas Jones

Mohammed Rashidur Rahman

175

30 Aug 2024

Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning

Maxime Méloux

Christophe Cerisara

KELM CLL

256

30 Aug 2024

How Reliable are Causal Probing Interventions?

343

28 Aug 2024

Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

Weiping Wang

424

27 Aug 2024

Can Transformers Do Enumerative Geometry?International Conference on Learning Representations (ICLR), 2024

Baran Hashemi

Roderic G. Corominas

Alessandro Giacchetto

904

27 Aug 2024

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

341

23 Aug 2024

Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

...

322

22 Aug 2024

Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing

Mengqi Zhang

Zhumin Chen

Liang Wang

KELM

164

22 Aug 2024

Defending against Jailbreak through Early Exit Generation of Large Language Models

238

21 Aug 2024

Personality Alignment of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

348

21 Aug 2024

Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs

230

20 Aug 2024

MEGen: Generative Backdoor into Large Language Models via Model EditingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

274

20 Aug 2024

KAN 2.0: Kolmogorov-Arnold Networks Meet Science

Ziming Liu

Pingchuan Ma

Yixuan Wang

Wojciech Matusik

Max Tegmark

344

159

19 Aug 2024

Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEditAAAI Conference on Artificial Intelligence (AAAI), 2024

692

19 Aug 2024

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRAAAAI Conference on Artificial Intelligence (AAAI), 2024

Quan Wang

Yongdong Zhang

208

19 Aug 2024

Activated Parameter Locating via Causal Intervention for Model Merging

162

18 Aug 2024

Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

306

16 Aug 2024

Lower Layers Matter: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused

235

16 Aug 2024

Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

Chenhui Hu

Pengfei Cao

Yubo Chen

Kang Liu

Jun Zhao

KELM

377

14 Aug 2024

Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Verna Dankers

Ivan Titov

261

09 Aug 2024

UNLEARN Efficient Removal of Knowledge in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Tyler Lizzo

Larry Heck

KELM MoMe MU

263

08 Aug 2024

KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

Xu Chu

176

06 Aug 2024

Unveiling Factual Recall Behaviors of Large Language Models through Knowledge NeuronsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

326

06 Aug 2024

The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights

136

05 Aug 2024

The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation AnalysisComputational Linguistics (CL), 2024

...

497

02 Aug 2024

Revisiting Bi-Encoder Neural Search: An Encoding--Searching Separation Perspective

Danbinaerin Han

Akiko Aizawa

Sihun Lee

211

02 Aug 2024

Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

587

31 Jul 2024

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

295

30 Jul 2024

Machine Unlearning in Generative AI: A Survey

327

30 Jul 2024

Detecting and Understanding Vulnerabilities in Language Models via Mechanistic InterpretabilityInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

Jorge García-Carrasco

A. Maté

Juan Trujillo

AAML

199

29 Jul 2024

Can Editing LLMs Inject Harm?

Zhaorun Chen

...

408

29 Jul 2024

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

Nitay Calderon

Roi Reichart

358

27 Jul 2024

Demystifying Verbatim Memorization in Large Language Models

Jing Huang

Diyi Yang

Christopher Potts

ELM PILM MU

306

25 Jul 2024

Model editing for distribution shifts in uranium oxide morphological analysis

216

22 Jul 2024

Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Shumin Deng

...

Yong Jiang

Pengjun Xie

Fei Huang

Huajun Chen

Ningyu Zhang

332

22 Jul 2024

Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis

273

21 Jul 2024

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Oyvind Tafjord

218

21 Jul 2024

LeKUBE: A Legal Knowledge Update BEnchmark

206

19 Jul 2024

Investigating the Indirect Object Identification circuit in Mamba

Danielle Ensign

Adrià Garriga-Alonso

Mamba

162

19 Jul 2024