Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
Neural Information Processing Systems (NeurIPS), 2022
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,361 papers shown
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
Shreyas Kapur
Vasil Georgiev
Cameron Allen
Scott Emmons
Stuart J. Russell
344
20
0
02 Jun 2024
DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models
Taolin Zhang
Qizhou Chen
Dongyang Li
Chengyu Wang
Xiaofeng He
Longtao Huang
Hui Xue
Junyuan Huang
CLL
KELM
247
8
0
31 May 2024
Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries
Jiahao Yu
Haozheng Luo
Jerry Yao-Chieh Hu
Wenbo Guo
Han Liu
Xinyu Xing
329
21
0
31 May 2024
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Siavash Golkar
Alberto Bietti
Mariel Pettee
Michael Eickenberg
M. Cranmer
...
Ruben Ohana
Liam Parker
Bruno Régaldo-Saint Blancard
Kyunghyun Cho
Shirley Ho
186
5
0
30 May 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yu Wang
Yanfeng Wang
229
7
0
30 May 2024
Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback
Jingwei Sun
Zhixu Du
Yiran Chen
KELM
256
4
0
30 May 2024
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors
Renzhi Wang
Piji Li
KELM
285
7
0
29 May 2024
Evaluating the External and Parametric Knowledge Fusion of Large Language Models
Hao Zhang
Yuyang Zhang
Xiaoguang Li
Wenxuan Shi
Haonan Xu
...
Yasheng Wang
Lifeng Shang
Qun Liu
Yong Liu
Ruiming Tang
KELM
246
7
0
29 May 2024
Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning
Renzhi Wang
Piji Li
184
5
0
28 May 2024
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
438
43
0
28 May 2024
Improved Generation of Adversarial Examples Against Safety-aligned LLMs
Qizhang Li
Yiwen Guo
Wangmeng Zuo
Hao Chen
AAML
SILM
243
12
0
28 May 2024
InversionView: A General-Purpose Method for Reading Information from Neural Activations
Xinting Huang
Madhur Panwar
Navin Goyal
Michael Hahn
359
9
0
27 May 2024
Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias
Xingbo He
Wenqi Fan
Ruobing Wang
Yili Wang
Ying Wang
Shirui Pan
Xin Wang
CML
237
7
0
27 May 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Trishna Chakraborty
Erfan Shayegani
Zikui Cai
Nael B. Abu-Ghazaleh
M. Salman Asif
Yue Dong
Amit K. Roy-Chowdhury
Chengyu Song
252
23
0
27 May 2024
Perturbation-Restrained Sequential Model Editing
Junjie Ma
Hong Wang
Haoyang Xu
Zhen-Hua Ling
Jia-Chen Gu
KELM
520
16
0
27 May 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
444
51
0
26 May 2024
Large Scale Knowledge Washing
Yu Wang
Ruihan Wu
Zexue He
Xinyu Chen
Julian McAuley
MU
KELM
426
13
0
26 May 2024
Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top
Keyuan Cheng
Muhammad Asif Ali
Shu Yang
Gang Lin
Yuxuan Zhai
Haoyang Fei
Ke Xu
Lu Yu
Lijie Hu
Haiyan Zhao
KELM
326
11
0
24 May 2024
Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
Jingcheng Deng
Zihao Wei
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
238
2
0
24 May 2024
Sparse Matrix in Large Language Model Fine-tuning
Haoze He
Juncheng Billy Li
Xuan Jiang
Heather Miller
MoE
313
8
0
24 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
704
32
0
24 May 2024
Linearly Controlled Language Generation with Performative Guarantees
Emily Cheng
Marco Baroni
378
13
0
24 May 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Boshi Wang
Xiang Yue
Yu-Chuan Su
Huan Sun
LRM
382
75
0
23 May 2024
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Bernal Jiménez Gutiérrez
Yiheng Shu
Yu Gu
Michihiro Yasunaga
Yu-Chuan Su
RALM
CLL
370
116
0
23 May 2024
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Peng Wang
Zexi Li
Ningyu Zhang
Ziwen Xu
Yunzhi Yao
Yong Jiang
Pengjun Xie
Fei Huang
Huajun Chen
KELM
CLL
312
61
0
23 May 2024
Automatically Identifying Local and Global Circuits with Linear Computation Graphs
Xuyang Ge
Fukang Zhu
Wentao Shu
Junxuan Wang
Zhengfu He
Xipeng Qiu
256
18
0
22 May 2024
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Rheeya Uppaal
Apratim De
Yiting He
Yiquao Zhong
Junjie Hu
592
7
0
22 May 2024
Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts
Baolong Bi
Shenghua Liu
Lingrui Mei
Yiwei Wang
Pengliang Ji
Xueqi Cheng
KELM
289
43
0
19 May 2024
BadActs: A Universal Backdoor Defense in the Activation Space
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
181
20
0
18 May 2024
Learnable Privacy Neurons Localization in Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ruizhe Chen
Tianxiang Hu
Yang Feng
Zuo-Qiang Liu
220
28
0
16 May 2024
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing
Ruizhe Chen
Yichen Li
Zikai Xiao
Zuo-Qiang Liu
KELM
334
18
0
15 May 2024
Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models
Transactions of the Association for Computational Linguistics (TACL), 2024
Anna A. Ivanova
Aalok Sathe
Benjamin Lipkin
Unnathi Kumar
S. Radkani
...
Leshem Choshen
Roger Levy
Evelina Fedorenko
Josh Tenenbaum
Jacob Andreas
311
56
0
15 May 2024
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
Georg Lange
Neel Nanda
359
61
0
14 May 2024
Can Language Models Explain Their Own Classification Behavior?
Dane Sherburn
Bilal Chughtai
Owain Evans
214
2
0
13 May 2024
Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning
Masane Fuchi
Tomohiro Takagi
DiffM
VLM
266
25
0
12 May 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
415
228
0
09 May 2024
Learned feature representations are biased by complexity, learning order, position, and more
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Katherine Hermann
AI4CE
FaML
SSL
OOD
275
20
0
09 May 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
413
1
0
09 May 2024
A Causal Explainable Guardrails for Large Language Models
Zhixuan Chu
Yan Wang
Longfei Li
Peng Kuang
Zhan Qin
Kui Ren
LLMSV
192
13
0
07 May 2024
How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability
Jorge García-Carrasco
Alejandro Maté
Juan Trujillo
205
12
0
07 May 2024
FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference
Runheng Liu
Xingchen Xiao
Heyan Huang
Zewen Chi
Zhijing Wu
RALM
KELM
353
1
0
07 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
282
24
0
06 May 2024
To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
International Conference on Machine Learning (ICML), 2024
George-Octavian Barbulescu
Peter Triantafillou
MU
359
33
0
06 May 2024
Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation
Kaize Shi
Xueyao Sun
Qing Li
Guandong Xu
301
22
0
06 May 2024
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ruizhe Li
Yanjun Gao
KELM
341
13
0
06 May 2024
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Qizhou Chen
Taolin Zhang
Xiaofeng He
Dongyang Li
Chengyu Wang
Longtao Huang
Hui Xue
CLL
KELM
384
33
0
06 May 2024
What does the Knowledge Neuron Thesis Have to do with Knowledge?
International Conference on Learning Representations (ICLR), 2024
Jingcheng Niu
Andrew Liu
Zining Zhu
Gerald Penn
337
47
0
03 May 2024
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
Junsang Yoon
Akshat Gupta
Gopala Anumanchipalli
141
9
0
01 May 2024
KAN: Kolmogorov-Arnold Networks
Ziming Liu
Yixuan Wang
Sachin Vaidya
Fabian Ruehle
James Halverson
Marin Soljacic
Thomas Y. Hou
Max Tegmark
986
1,261
0
30 Apr 2024
Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu
Pepa Atanasova
Isabelle Augenstein
KELM
219
10
0
29 Apr 2024
Previous
1
2
3
...
18
19
20
...
26
27
28
Next
Page 19 of 28
Page
of 28
Go