Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
Neural Information Processing Systems (NeurIPS), 2022
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,361 papers shown
The Hydra Effect: Emergent Self-repair in Language Model Computations
Tom McGrath
Matthew Rahtz
János Kramár
Vladimir Mikulik
Shane Legg
MILM
LRM
229
91
0
28 Jul 2023
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines
Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2023
Matthew Barker
Emma Kallina
D. Ashok
Katherine M. Collins
Ashley Casovan
Adrian Weller
Ameet Talwalkar
Valerie Chen
Umang Bhatt
189
11
0
28 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
367
731
0
27 Jul 2023
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Transactions of the Association for Computational Linguistics (TACL), 2023
Roi Cohen
Eden Biran
Ori Yoran
Amir Globerson
Mor Geva
KELM
419
231
0
24 Jul 2023
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
Neural Information Processing Systems (NeurIPS), 2023
Neel Guha
Mayee F. Chen
Kush S. Bhatia
Azalia Mirhoseini
Frederic Sala
Christopher Ré
186
4
0
20 Jul 2023
Deceptive Alignment Monitoring
Andres Carranza
Dhruv Pai
Rylan Schaeffer
Arnuv Tandon
Oluwasanmi Koyejo
220
13
0
20 Jul 2023
Can Neural Network Memorization Be Localized?
International Conference on Machine Learning (ICML), 2023
Pratyush Maini
Michael C. Mozer
Hanie Sedghi
Zachary Chase Lipton
J. Zico Kolter
Chiyuan Zhang
TDI
182
73
0
18 Jul 2023
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
International Conference on Learning Representations (ICLR), 2023
Danny Halawi
Jean-Stanislas Denain
Jacob Steinhardt
315
72
0
18 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
322
141
0
18 Jul 2023
Discovering Variable Binding Circuitry with Desiderata
Xander Davies
Max Nadeau
Nikhil Prakash
Tamar Rott Shaham
David Bau
192
21
0
07 Jul 2023
An Overview of Catastrophic AI Risks
Dan Hendrycks
Mantas Mazeika
Thomas Woodside
SILM
601
253
0
21 Jun 2023
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Neural Information Processing Systems (NeurIPS), 2023
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
223
22
0
16 Jun 2023
Propagating Knowledge Updates to LMs Through Distillation
Neural Information Processing Systems (NeurIPS), 2023
Shankar Padmanabhan
Yasumasa Onoe
Michael J.Q. Zhang
Greg Durrett
Eunsol Choi
KELM
268
21
0
15 Jun 2023
Operationalising Representation in Natural Language Processing
British Journal for the Philosophy of Science (BJPS), 2023
J. Harding
351
17
0
14 Jun 2023
Measuring and Modifying Factual Knowledge in Large Language Models
International Conference on Machine Learning and Applications (ICMLA), 2023
Pouya Pezeshkpour
KELM
198
22
0
09 Jun 2023
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shizhe Diao
Tianyang Xu
Ruijia Xu
Jiawei Wang
Tong Zhang
MoE
AI4CE
224
51
0
08 Jun 2023
Causal interventions expose implicit situation models for commonsense language understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
323
8
0
06 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Neural Information Processing Systems (NeurIPS), 2023
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
758
839
0
06 Jun 2023
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency
Neural Information Processing Systems (NeurIPS), 2023
Owen Queen
Thomas Hartvigsen
Teddy Koker
Huan He
Theodoros Tsiligkaridis
Marinka Zitnik
AI4TS
312
34
0
03 Jun 2023
Learning Transformer Programs
Neural Information Processing Systems (NeurIPS), 2023
Dan Friedman
Alexander Wettig
Danqi Chen
301
48
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
Neural Information Processing Systems (NeurIPS), 2023
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Edouard Grave
Léon Bottou
395
142
0
01 Jun 2023
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Dana Arad
Hadas Orgad
Yonatan Belinkov
KELM
348
29
0
01 Jun 2023
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Bo Pan
ALM
417
216
0
30 May 2023
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Neural Information Processing Systems (NeurIPS), 2023
Zehao Wang
Alexander Ku
Jason Baldridge
Thomas Griffiths
Been Kim
UQCV
261
15
0
29 May 2023
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
J. Hoelscher-Obermaier
Julia Persson
Esben Kran
Ioannis Konstas
Fazl Barez
KELM
261
70
0
27 May 2023
Theoretical and Practical Perspectives on what Influence Functions Do
Neural Information Processing Systems (NeurIPS), 2023
Andrea Schioppa
Katja Filippova
Ivan Titov
Polina Zablotskaia
TDI
177
32
0
26 May 2023
Backpack Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
John Hewitt
John Thickstun
Christopher D. Manning
Abigail Z. Jacobs
KELM
239
20
0
26 May 2023
ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models
ACM Transactions on Graphics (TOG), 2023
Yuxin Zhang
Weiming Dong
Fan Tang
Nisha Huang
Haibin Huang
Chongyang Ma
Tong-Yee Lee
Oliver Deussen
Changsheng Xu
DiffM
431
120
0
25 May 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
KELM
340
85
0
25 May 2023
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
International Conference on Learning Representations (ICLR), 2023
Niels Mündler
Jingxuan He
Slobodan Jenko
Martin Vechev
HILM
314
159
0
25 May 2023
Editable Graph Neural Network for Node Classifications
Zirui Liu
Zhimeng Jiang
Shaochen Zhong
Kaixiong Zhou
Li Li
Rui Chen
Soo-Hyun Choi
Helen Zhou
227
8
0
24 May 2023
Referral Augmentation for Zero-Shot Information Retrieval
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Michael Tang
Shunyu Yao
John Yang
Karthik Narasimhan
249
3
0
24 May 2023
Meta-Learning Online Adaptation of Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Nathan J. Hu
E. Mitchell
Christopher D. Manning
Chelsea Finn
KELM
297
43
0
24 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
286
67
0
24 May 2023
Editing Common Sense in Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Anshita Gupta
Debanjan Mondal
Akshay Krishna Sheshadri
Wenlong Zhao
Xiang Lorraine Li
Sarah Wiegreffe
Niket Tandon
KELM
230
32
0
24 May 2023
Mitigating Temporal Misalignment by Discarding Outdated Facts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Michael J.Q. Zhang
Eunsol Choi
KELM
HILM
287
24
0
24 May 2023
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zexuan Zhong
Zhengxuan Wu
Christopher D. Manning
Christopher Potts
Danqi Chen
KELM
396
276
0
24 May 2023
Can Transformers Learn to Solve Problems Recursively?
Shizhuo Zhang
Curt Tigges
Stella Biderman
Maxim Raginsky
Talia Ringer
176
21
0
24 May 2023
All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations
Yuxin Ren
Qipeng Guo
Zhijing Jin
Shauli Ravfogel
Mrinmaya Sachan
Bernhard Schölkopf
Robert Bamler
149
5
0
23 May 2023
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models
Shashank Sonkar
Richard G. Baraniuk
126
2
0
23 May 2023
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sina J. Semnani
Violet Z. Yao
He Zhang
M. Lam
KELM
AI4MH
340
103
0
23 May 2023
Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in Foundation Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Tim Schott
Daniel Furman
Shreshta Bhat
ELM
283
5
0
23 May 2023
The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shuo Zhang
Liangming Pan
Junzhou Zhao
Wenjie Wang
HILM
245
1
0
23 May 2023
VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shahar Katz
Yonatan Belinkov
214
36
0
22 May 2023
Can LLMs facilitate interpretation of pre-trained language models?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Basel Mousi
Nadir Durrani
Fahim Dalvi
303
16
0
22 May 2023
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
International Conference on Learning Representations (ICLR), 2023
Jian Xie
Kai Zhang
Jiangjie Chen
Renze Lou
Yu-Chuan Su
RALM
713
253
0
22 May 2023
LM vs LM: Detecting Factual Errors via Cross Examination
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
HILM
331
186
0
22 May 2023
Editing Large Language Models: Problems, Methods, and Opportunities
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yunzhi Yao
Peng Wang
Bo Tian
Shuyang Cheng
Zhoubo Li
Shumin Deng
Huajun Chen
Ningyu Zhang
KELM
349
400
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
594
862
0
22 May 2023
Can We Edit Factual Knowledge by In-Context Learning?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ce Zheng
Lei Li
Qingxiu Dong
Yuxuan Fan
Zhiyong Wu
Jingjing Xu
Baobao Chang
KELM
253
289
0
22 May 2023
Previous
1
2
3
...
25
26
27
28
Next
Page 26 of 28
Page
of 28
Go