Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 924 papers shown
Title
Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs
Suhang Wu
Minlong Peng
Yue Chen
Jinsong Su
Mingming Sun
KELM
40
35
0
19 Aug 2023
Linearity of Relation Decoding in Transformer Language Models
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
17
83
0
17 Aug 2023
PMET: Precise Model Editing in a Transformer
Xiaopeng Li
Shasha Li
Shezheng Song
Jing Yang
Jun Ma
Jie Yu
KELM
26
115
0
17 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
M. Zhang
KELM
MU
25
26
0
16 Aug 2023
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Peng Wang
Ningyu Zhang
Bo Tian
Zekun Xi
Yunzhi Yao
...
Shuyang Cheng
Kangwei Liu
Yuansheng Ni
Guozhou Zheng
Huajun Chen
KELM
27
41
0
14 Aug 2023
Explaining Relation Classification Models with Semantic Extents
Lars Klöser
André Büsgen
Philipp Kohl
Bodo Kraft
Albert Zündorf
16
0
0
04 Aug 2023
Multimodal Neurons in Pretrained Text-Only Transformers
Sarah Schwettmann
Neil Chowdhury
Samuel J. Klein
David Bau
Antonio Torralba
MILM
30
27
0
03 Aug 2023
Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI
Avijit Ghosh
D. Lakshmi
22
3
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
22
10
0
02 Aug 2023
The Hydra Effect: Emergent Self-repair in Language Model Computations
Tom McGrath
Matthew Rahtz
János Kramár
Vladimir Mikulik
Shane Legg
MILM
LRM
15
68
0
28 Jul 2023
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines
Matthew Barker
Emma Kallina
D. Ashok
Katherine M. Collins
Ashley Casovan
Adrian Weller
Ameet Talwalkar
Valerie Chen
Umang Bhatt
36
5
0
28 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
44
470
0
27 Jul 2023
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Roi Cohen
Eden Biran
Ori Yoran
Amir Globerson
Mor Geva
KELM
40
155
0
24 Jul 2023
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
Neel Guha
Mayee F. Chen
Kush S. Bhatia
Azalia Mirhoseini
Frederic Sala
Christopher Ré
24
4
0
20 Jul 2023
Deceptive Alignment Monitoring
Andres Carranza
Dhruv Pai
Rylan Schaeffer
Arnuv Tandon
Oluwasanmi Koyejo
37
7
0
20 Jul 2023
Can Neural Network Memorization Be Localized?
Pratyush Maini
Michael C. Mozer
Hanie Sedghi
Zachary Chase Lipton
J. Zico Kolter
Chiyuan Zhang
TDI
36
46
0
18 Jul 2023
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Danny Halawi
Jean-Stanislas Denain
Jacob Steinhardt
28
52
0
18 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
21
100
0
18 Jul 2023
Discovering Variable Binding Circuitry with Desiderata
Xander Davies
Max Nadeau
Nikhil Prakash
Tamar Rott Shaham
David Bau
23
12
0
07 Jul 2023
An Overview of Catastrophic AI Risks
Dan Hendrycks
Mantas Mazeika
Thomas Woodside
SILM
21
165
0
21 Jun 2023
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
34
8
0
16 Jun 2023
Propagating Knowledge Updates to LMs Through Distillation
Shankar Padmanabhan
Yasumasa Onoe
Michael J.Q. Zhang
Greg Durrett
Eunsol Choi
KELM
10
18
0
15 Jun 2023
Operationalising Representation in Natural Language Processing
J. Harding
28
11
0
14 Jun 2023
Measuring and Modifying Factual Knowledge in Large Language Models
Pouya Pezeshkpour
KELM
14
17
0
09 Jun 2023
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
Shizhe Diao
Tianyang Xu
Ruijia Xu
Jiawei Wang
Tong Zhang
MoE
AI4CE
11
36
0
08 Jun 2023
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
17
6
0
06 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
26
472
0
06 Jun 2023
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency
Owen Queen
Thomas Hartvigsen
Teddy Koker
Huan He
Theodoros Tsiligkaridis
Marinka Zitnik
AI4TS
37
17
0
03 Jun 2023
Learning Transformer Programs
Dan Friedman
Alexander Wettig
Danqi Chen
28
32
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
21
81
0
01 Jun 2023
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
Dana Arad
Hadas Orgad
Yonatan Belinkov
KELM
41
18
0
01 Jun 2023
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Liang Zhao
ALM
23
126
0
30 May 2023
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Z. Wang
Alexander Ku
Jason Baldridge
Thomas L. Griffiths
Been Kim
UQCV
21
11
0
29 May 2023
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
J. Hoelscher-Obermaier
Julia Persson
Esben Kran
Ioannis Konstas
Fazl Barez
KELM
19
56
0
27 May 2023
Theoretical and Practical Perspectives on what Influence Functions Do
Andrea Schioppa
Katja Filippova
Ivan Titov
Polina Zablotskaia
TDI
19
14
0
26 May 2023
Backpack Language Models
John Hewitt
John Thickstun
Christopher D. Manning
Percy Liang
KELM
13
16
0
26 May 2023
ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models
Yu-xin Zhang
Weiming Dong
Fan Tang
Nisha Huang
Haibin Huang
Chongyang Ma
Tong-Yee Lee
Oliver Deussen
Changsheng Xu
DiffM
27
75
0
25 May 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
KELM
26
52
0
25 May 2023
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Niels Mündler
Jingxuan He
Slobodan Jenko
Martin Vechev
HILM
16
108
0
25 May 2023
Editable Graph Neural Network for Node Classifications
Zirui Liu
Zhimeng Jiang
Shaochen Zhong
Kaixiong Zhou
Li Li
Rui Chen
Soo-Hyun Choi
Xia Hu
17
6
0
24 May 2023
Referral Augmentation for Zero-Shot Information Retrieval
Michael Tang
Shunyu Yao
John Yang
Karthik Narasimhan
28
3
0
24 May 2023
Meta-Learning Online Adaptation of Language Models
Nathan J. Hu
E. Mitchell
Christopher D. Manning
Chelsea Finn
KELM
21
34
0
24 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
33
47
0
24 May 2023
Editing Common Sense in Transformers
Anshita Gupta
Debanjan Mondal
Akshay Krishna Sheshadri
Wenlong Zhao
Xiang Lorraine Li
Sarah Wiegreffe
Niket Tandon
KELM
34
21
0
24 May 2023
Mitigating Temporal Misalignment by Discarding Outdated Facts
Michael J.Q. Zhang
Eunsol Choi
KELM
HILM
24
17
0
24 May 2023
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Zexuan Zhong
Zhengxuan Wu
Christopher D. Manning
Christopher Potts
Danqi Chen
KELM
24
185
0
24 May 2023
Can Transformers Learn to Solve Problems Recursively?
Shizhuo Zhang
Curt Tigges
Stella Biderman
Maxim Raginsky
Talia Ringer
15
13
0
24 May 2023
All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations
Yuxin Ren
Qipeng Guo
Zhijing Jin
Shauli Ravfogel
Mrinmaya Sachan
Bernhard Schölkopf
Ryan Cotterell
22
4
0
23 May 2023
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models
Shashank Sonkar
Richard G. Baraniuk
16
1
0
23 May 2023
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Sina J. Semnani
Violet Z. Yao
He Zhang
M. Lam
KELM
AI4MH
20
72
0
23 May 2023
Previous
1
2
3
...
16
17
18
19
Next