Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
Neural Information Processing Systems (NeurIPS), 2022
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,361 papers shown
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
International Conference on Language Resources and Evaluation (LREC), 2023
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Amélie Reymond
LRM
320
8
0
21 May 2023
Decouple knowledge from parameters for plug-and-play language modeling
Xin Cheng
Yankai Lin
Preslav Nakov
Dongyan Zhao
Rui Yan
KELM
227
2
0
19 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Neural Information Processing Systems (NeurIPS), 2023
Zhengxuan Wu
Atticus Geiger
Thomas Icard
Christopher Potts
Noah D. Goodman
MILM
434
108
0
15 May 2023
Semantic Composition in Visually Grounded Language Models
Rohan Pandey
CoGe
201
1
0
15 May 2023
FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shangbin Feng
Vidhisha Balachandran
Yuyang Bai
Yulia Tsvetkov
KELM
HILM
308
62
0
14 May 2023
RECKONING: Reasoning through Dynamic Knowledge Encoding
Neural Information Processing Systems (NeurIPS), 2023
Zeming Chen
Gail Weiss
E. Mitchell
Asli Celikyilmaz
Antoine Bosselut
KELM
LRM
349
15
0
10 May 2023
Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer
Tao Hong
64
1
0
08 May 2023
Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Kaixin Ma
Hao Cheng
Yu Zhang
Xiaodong Liu
Eric Nyberg
Jianfeng Gao
LRM
189
20
0
04 May 2023
ReMask: A Robust Information-Masking Approach for Domain Counterfactual Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Pengfei Hong
Rishabh Bhardwaj
Navonil Majumdar
Somak Aditya
Soujanya Poria
AAML
151
1
0
04 May 2023
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yasumasa Onoe
Michael J.Q. Zhang
Shankar Padmanabhan
Greg Durrett
Eunsol Choi
KELM
421
84
0
02 May 2023
Key-Locked Rank One Editing for Text-to-Image Personalization
International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2023
Yoad Tewel
Rinon Gal
Gal Chechik
Yuval Atzmon
DiffM
425
217
0
02 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
520
286
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Neural Information Processing Systems (NeurIPS), 2023
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
1.0K
179
0
30 Apr 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Neural Information Processing Systems (NeurIPS), 2023
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
531
442
0
28 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
724
418
0
28 Apr 2023
Label-Free Concept Bottleneck Models
International Conference on Learning Representations (ICLR), 2023
Tuomas P. Oikarinen
Subhro Das
Lam M. Nguyen
Tsui-Wei Weng
332
239
0
12 Apr 2023
Localizing Model Behavior with Path Patching
Nicholas W. Goldowsky-Dill
Chris MacLeod
L. Sato
Aryaman Arora
489
122
0
12 Apr 2023
Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez
Belinda Z. Li
Jacob Andreas
KELM
300
123
0
03 Apr 2023
Ablating Concepts in Text-to-Image Diffusion Models
IEEE International Conference on Computer Vision (ICCV), 2023
Nupur Kumari
Bin Zhang
Sheng-Yu Wang
Eli Shechtman
Richard Y. Zhang
Jun-Yan Zhu
VLM
478
282
0
23 Mar 2023
Language Model Behavior: A Comprehensive Survey
International Conference on Computational Logic (ICCL), 2023
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
372
139
0
20 Mar 2023
Context-faithful Prompting for Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenxuan Zhou
Sheng Zhang
Hoifung Poon
Muhao Chen
KELM
240
80
0
20 Mar 2023
Editing Implicit Assumptions in Text-to-Image Diffusion Models
IEEE International Conference on Computer Vision (ICCV), 2023
Hadas Orgad
Bahjat Kawar
Yonatan Belinkov
DiffM
363
115
0
14 Mar 2023
Erasing Concepts from Diffusion Models
IEEE International Conference on Computer Vision (ICCV), 2023
Rohit Gandikota
Joanna Materzyñska
Jaden Fiotto-Kaufman
David Bau
DiffM
494
431
0
13 Mar 2023
Making a Computational Attorney
SDM (SDM), 2023
Dell Zhang
Frank Schilder
Jack G. Conrad
Masoud Makrehchi
David von Rickenbach
Isabelle Moulinier
165
1
0
07 Mar 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
CLEaR (CLEaR), 2023
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas Icard
Noah D. Goodman
CML
499
139
0
05 Mar 2023
Competence-Based Analysis of Language Models
Adam Davies
Jize Jiang
Chengxiang Zhai
ELM
357
7
0
01 Mar 2023
Edit at your own risk: evaluating the robustness of edited models to distribution shifts
Davis Brown
Charles Godfrey
Cody Nizinski
Jonathan Tu
Henry Kvinge
KELM
246
8
0
28 Feb 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Gabriele Sarti
Nils Feldhus
Ludwig Sickert
Oskar van der Wal
Malvina Nissim
Arianna Bisazza
317
90
0
27 Feb 2023
Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
Conference on Fairness, Accountability and Transparency (FAccT), 2023
Max Lamparth
Anka Reuel
KELM
213
15
0
24 Feb 2023
Task-Specific Skill Localization in Fine-tuned Language Models
International Conference on Machine Learning (ICML), 2023
A. Panigrahi
Nikunj Saunshi
Haoyu Zhao
Sanjeev Arora
MoMe
322
89
0
13 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
178
36
0
07 Feb 2023
Effective Data Augmentation With Diffusion Models
International Conference on Learning Representations (ICLR), 2023
Brandon Trabucco
Kyle Doherty
Max Gurinas
Ruslan Salakhutdinov
DiffM
VLM
472
336
0
07 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
International Conference on Learning Representations (ICLR), 2023
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
463
25
0
01 Feb 2023
Do Multi-Document Summarization Models Synthesize?
Transactions of the Association for Computational Linguistics (TACL), 2023
Jay DeYoung
Stephanie C. Martinez
Iain J. Marshall
Byron C. Wallace
287
14
0
31 Jan 2023
Truth Machines: Synthesizing Veracity in AI Language Models
Ai & Society (AI & Society), 2023
Luke Munn
Liam Magee
Vanicka Arora
SyDa
HILM
96
50
0
28 Jan 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
Neural Information Processing Systems (NeurIPS), 2023
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
494
87
0
12 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
180
9
0
05 Jan 2023
A Survey on Knowledge-Enhanced Pre-trained Language Models
Chaoqi Zhen
Yanlei Shang
Xiangyu Liu
Yifei Li
Yong Chen
Dell Zhang
VLM
KELM
238
3
0
27 Dec 2022
DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Prakhar Gupta
Yang Liu
Di Jin
Behnam Hedayatnia
Spandana Gella
Sijia Liu
P. Lange
Julia Hirschberg
Dilek Z. Hakkani-Tür
220
6
0
20 Dec 2022
DSI++: Updating Transformer Memory with New Documents
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Sanket Vaibhav Mehta
Jai Gupta
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
J. Rao
Marc Najork
Emma Strubell
Donald Metzler
CLL
223
60
0
19 Dec 2022
Talking About Large Language Models
Communications of the ACM (CACM), 2022
Murray Shanahan
AI4CE
350
373
0
07 Dec 2022
Language Models as Agent Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jacob Andreas
LLMAG
269
169
0
03 Dec 2022
Convexifying Transformers: Improving optimization and understanding of transformer networks
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
MLT
226
15
0
20 Nov 2022
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
Neural Information Processing Systems (NeurIPS), 2022
Thomas Hartvigsen
S. Sankaranarayanan
Hamid Palangi
Yoon Kim
Marzyeh Ghassemi
KELM
639
234
0
20 Nov 2022
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
Stephen Casper
K. Hariharan
Dylan Hadfield-Menell
AAML
401
11
0
18 Nov 2022
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Ella Neeman
Roee Aharoni
Or Honovich
Leshem Choshen
Idan Szpektor
Omri Abend
KELM
CML
230
101
0
10 Nov 2022
On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey
Xu Guo
Han Yu
LM&MA
VLM
307
34
0
06 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
International Conference on Learning Representations (ICLR), 2022
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
616
775
0
01 Nov 2022
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models
Conference on Computational Natural Language Learning (CoNLL), 2022
Aaron Mueller
Yudi Xia
Tal Linzen
MILM
233
13
0
25 Oct 2022
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Alessandro Stolfo
Zhijing Jin
Kumar Shridhar
Bernhard Schölkopf
Mrinmaya Sachan
ELM
OOD
LRM
384
78
0
21 Oct 2022
Previous
1
2
3
...
26
27
28
Next