v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

Neural Information Processing Systems (NeurIPS), 2022

10 February 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,361 papers shown

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

344

02 Jun 2024

DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

Chengyu Wang

247

31 May 2024

Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries

329

31 May 2024

Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

...

Bruno Régaldo-Saint Blancard

Kyunghyun Cho

Shirley Ho

186

30 May 2024

TAIA: Large Language Models are Out-of-Distribution Data Learners

229

30 May 2024

Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback

Jingwei Sun

Zhixu Du

Yiran Chen

KELM

256

30 May 2024

MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors

Renzhi Wang

Piji Li

KELM

285

29 May 2024

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

...

Lifeng Shang

Qun Liu

Yong Liu

Ruiming Tang

KELM

246

29 May 2024

Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning

Renzhi Wang

Piji Li

184

28 May 2024

Knowledge Circuits in Pretrained Transformers

Ningyu Zhang

Shumin Deng

Huajun Chen

KELM

438

28 May 2024

Improved Generation of Adversarial Examples Against Safety-aligned LLMs

Wangmeng Zuo

243

28 May 2024

InversionView: A General-Purpose Method for Reading Information from Neural Activations

359

27 May 2024

Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

237

27 May 2024

Cross-Modal Safety Alignment: Is textual unlearning all you need?

Amit K. Roy-Chowdhury

Chengyu Song

252

27 May 2024

Perturbation-Restrained Sequential Model Editing

520

27 May 2024

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

444

26 May 2024

Large Scale Knowledge Washing

426

26 May 2024

Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top

Lijie Hu

326

24 May 2024

Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models

238

24 May 2024

Sparse Matrix in Large Language Model Fine-tuning

313

24 May 2024

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

704

24 May 2024

Linearly Controlled Language Generation with Performative Guarantees

Emily Cheng

Marco Baroni

378

24 May 2024

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

382

23 May 2024

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

Bernal Jiménez Gutiérrez

370

116

23 May 2024

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

Peng Wang

Ningyu Zhang

Fei Huang

Huajun Chen

KELM CLL

312

23 May 2024

Automatically Identifying Local and Global Circuits with Linear Computation Graphs

Xipeng Qiu

256

22 May 2024

Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity

592

22 May 2024

Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts

Baolong Bi

Shenghua Liu

Lingrui Mei

289

19 May 2024

BadActs: A Universal Backdoor Defense in the Activation SpaceAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

181

18 May 2024

Learnable Privacy Neurons Localization in Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

220

16 May 2024

Large Language Model Bias Mitigation from the Perspective of Knowledge Editing

334

15 May 2024

Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2024

...

311

15 May 2024

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Aleksandar Makelov

Georg Lange

Neel Nanda

359

14 May 2024

Can Language Models Explain Their Own Classification Behavior?

Dane Sherburn

Bilal Chughtai

Owain Evans

214

13 May 2024

Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning

Masane Fuchi

Tomohiro Takagi

DiffM VLM

266

12 May 2024

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

415

228

09 May 2024

Learned feature representations are biased by complexity, learning order, position, and more

275

09 May 2024

Binary Hypothesis Testing for Softmax Models and Leverage Score Models

Yeqi Gao

Yuzhou Gu

Zhao Song

413

09 May 2024

A Causal Explainable Guardrails for Large Language Models

192

07 May 2024

How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability

Jorge García-Carrasco

Alejandro Maté

Juan Trujillo

205

07 May 2024

FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

353

07 May 2024

A Philosophical Introduction to Language Models - Part II: The Way Forward

Raphael Milliere

Cameron Buckner

LRM

282

06 May 2024

To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language ModelsInternational Conference on Machine Learning (ICML), 2024

George-Octavian Barbulescu

Peter Triantafillou

359

06 May 2024

Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation

301

06 May 2024

Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Ruizhe Li

Yanjun Gao

KELM

341

06 May 2024

Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Chengyu Wang

384

06 May 2024

What does the Knowledge Neuron Thesis Have to do with Knowledge?International Conference on Learning Representations (ICLR), 2024

337

03 May 2024

Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

Junsang Yoon

Akshat Gupta

Gopala Anumanchipalli

141

01 May 2024

KAN: Kolmogorov-Arnold Networks

986

1,261

30 Apr 2024

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

219

29 Apr 2024