v1v2 (latest)

Knowledge Neurons in Pretrained Transformers

Annual Meeting of the Association for Computational Linguistics (ACL), 2021

18 April 2021

Damai Dai

Li Dong

Y. Hao

Zhifang Sui

Baobao Chang

Furu Wei

KELM

ArXiv (abs)PDF HTML Github (168★)

Papers citing "Knowledge Neurons in Pretrained Transformers"

50 / 410 papers shown

LoFiT: Localized Fine-tuning on LLM Representations

Fangcong Yin

Xi Ye

Greg Durrett

265

03 Jun 2024

From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation

311

03 Jun 2024

Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback

Jingwei Sun

Zhixu Du

Yiran Chen

KELM

250

30 May 2024

MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors

Renzhi Wang

Piji Li

KELM

284

29 May 2024

Knowledge Circuits in Pretrained Transformers

Ningyu Zhang

Shumin Deng

Huajun Chen

KELM

436

28 May 2024

Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization

489

27 May 2024

Perturbation-Restrained Sequential Model Editing

510

27 May 2024

Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models

234

24 May 2024

Linearly Controlled Language Generation with Performative Guarantees

Emily Cheng

Marco Baroni

368

24 May 2024

Implicit In-context LearningInternational Conference on Learning Representations (ICLR), 2024

Di Liu

357

23 May 2024

Learnable Privacy Neurons Localization in Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

220

16 May 2024

Spectral Editing of Activations for Large Language Model AlignmentNeural Information Processing Systems (NeurIPS), 2024

400

15 May 2024

Large Language Model Bias Mitigation from the Perspective of Knowledge Editing

333

15 May 2024

Localizing Task Information for Improved Model Merging and CompressionInternational Conference on Machine Learning (ICML), 2024

Ke Wang

Nikolaos Dimitriadis

Guillermo Ortiz-Jimenez

Franccois Fleuret

Pascal Frossard

MoMe

287

13 May 2024

Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning

Masane Fuchi

Tomohiro Takagi

DiffM VLM

263

12 May 2024

Memory-Space Visual Prompting for Efficient Vision-Language Fine-TuningInternational Conference on Machine Learning (ICML), 2024

345

09 May 2024

Binary Hypothesis Testing for Softmax Models and Leverage Score Models

Yeqi Gao

Yuzhou Gu

Zhao Song

412

09 May 2024

Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Ruizhe Li

Yanjun Gao

KELM

338

06 May 2024

What does the Knowledge Neuron Thesis Have to do with Knowledge?International Conference on Learning Representations (ICLR), 2024

327

03 May 2024

A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples

Chun Yu

203

24 Apr 2024

From Matching to Generation: A Survey on Generative Information Retrieval

Xiaoxi Li

Jiajie Jin

Peitian Zhang

551

135

23 Apr 2024

Mechanistic Interpretability for AI Safety -- A Review

Leonard Bereska

E. Gavves

AI4CE

339

298

22 Apr 2024

Decomposing and Editing Predictions by Modeling Model Computation

Harshay Shah

Andrew Ilyas

Aleksander Madry

KELM

290

17 Apr 2024

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

614

17 Apr 2024

DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

222

16 Apr 2024

Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda

Johannes Schneider

260

15 Apr 2024

Scalable Model Editing via Customized Expert Networks

232

03 Apr 2024

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Sanghyun Hong

275

01 Apr 2024

The Unreasonable Ineffectiveness of the Deeper Layers

428

158

26 Mar 2024

Locating and Mitigating Gender Bias in Large Language Models

176

21 Mar 2024

A Unified Framework for Model Editing

Akshat Gupta

Dev Sajnani

Gopala Anumanchipalli

KELM

311

21 Mar 2024

BadEdit: Backdooring large language models by model editing

Yang Liu

230

20 Mar 2024

Larimar: Large Language Models with Episodic Memory ControlInternational Conference on Machine Learning (ICML), 2024

...

Vijil Chenthamarakshan

377

18 Mar 2024

Towards a theory of model distillation

Enric Boix-Adserà

FedML VLM

235

14 Mar 2024

VLKEB: A Large Vision-Language Model Knowledge Editing BenchmarkNeural Information Processing Systems (NeurIPS), 2024

Qiang Liu

Liang Wang

267

12 Mar 2024

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

433

03 Mar 2024

Information Flow Routes: Automatically Interpreting Language Models at Scale

Javier Ferrando

Elena Voita

377

27 Feb 2024

InstructEdit: Instruction-based Knowledge Editing for Large Language Models

Ningyu Zhang

Huajun Chen

205

25 Feb 2024

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

Philip Quirke

Shay B. Cohen

Fazl Barez

178

23 Feb 2024

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Ruiqi Zhang

Jingfeng Wu

Peter L. Bartlett

307

22 Feb 2024

MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

Jun Zhao

Kang Liu

MoE ALM

186

20 Feb 2024

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models

Yuxiang Zhang

166

16 Feb 2024

Rethinking Machine Unlearning for Large Language Models

...

Mohit Bansal

Yang Liu

428

200

13 Feb 2024

Discriminative Adversarial Unlearning

169

10 Feb 2024

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting

Liang Sun

162

08 Feb 2024

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

Reduan Achtibat

Sayed Mohammad Vakilzadeh Hatefi

342

08 Feb 2024

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Kaixuan Huang

Mengdi Wang

312

174

07 Feb 2024

Exploring higher-order neural network node interactions with total correlation

Thomas Kerby

Teresa White

Kevin Moon

136

06 Feb 2024

Neighboring Perturbations of Knowledge Editing on Large Language Models

Ningyu Zhang

195

31 Jan 2024

Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks

318

31 Jan 2024