Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
Neural Information Processing Systems (NeurIPS), 2022
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,361 papers shown
Scaling Laws for Associative Memories
International Conference on Learning Representations (ICLR), 2023
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
352
25
0
04 Oct 2023
Can Language Models be Instructed to Protect Personal Information?
Yang Chen
Ethan Mendes
Sauvik Das
Wei Xu
Alan Ritter
PILM
198
47
0
03 Oct 2023
Language Models Represent Space and Time
International Conference on Learning Representations (ICLR), 2023
Wes Gurnee
Max Tegmark
533
233
0
03 Oct 2023
Editing Personality for Large Language Models
Natural Language Processing and Chinese Computing (NLPCC), 2023
Shengyu Mao
Xiaohan Wang
Meng Wang
Yong Jiang
Pengjun Xie
Yan Zhang
Ningyu Zhang
KELM
403
16
0
03 Oct 2023
Modularity in Deep Learning: A Survey
Haozhe Sun
Isabelle Guyon
MoMe
314
7
0
02 Oct 2023
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models
Duanyu Feng
Yongfu Dai
Jimin Huang
Yifang Zhang
Qianqian Xie
Weiguang Han
Zhengyu Chen
Alejandro Lopez-Lira
Hao Wang
265
19
0
01 Oct 2023
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
275
49
0
30 Sep 2023
RelBERT: Embedding Relations with Language Models
Artificial Intelligence (AIJ), 2023
Asahi Ushio
Jose Camacho-Collados
Steven Schockaert
KELM
324
3
0
30 Sep 2023
Medical Foundation Models are Susceptible to Targeted Misinformation Attacks
T. Han
S. Nebelung
Firas Khader
Tian Wang
Gustav Mueller-Franzes
...
Jens Kleesiek
Christoph Haarburger
Keno K. Bressem
Jakob Nikolas Kather
Daniel Truhn
AAML
95
7
0
29 Sep 2023
KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models
Yiming Ju
Zheng Zhang
KELM
168
9
0
28 Sep 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
International Conference on Learning Representations (ICLR), 2023
Fred Zhang
Neel Nanda
LLMSV
531
175
0
27 Sep 2023
MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering
Takuya Higuchi
Shaochen Xu
Avamarie Brueggeman
Zheng Liu
Tianming Liu
Xiang Li
Ninghao Liu
RALM
232
34
0
27 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
IEEE Games Entertainment Media Conference (IEEE GEM), 2023
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
233
0
0
27 Sep 2023
Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey
Victoria Smith
Ali Shahin Shamsabadi
Carolyn Ashurst
Adrian Weller
PILM
503
41
0
27 Sep 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
International Conference on Learning Representations (ICLR), 2023
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
213
68
0
26 Sep 2023
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi
A. J. Chan
Sören Mindermann
Ilan Moscovitz
Alexa Y. Pan
Y. Gal
Owain Evans
J. Brauner
LLMAG
HILM
255
79
0
26 Sep 2023
Large Language Model Alignment: A Survey
Shangda Wu
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
363
287
0
26 Sep 2023
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
International Conference on Machine Learning (ICML), 2023
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
546
238
0
25 Sep 2023
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems
Leonardo Ranaldi
Fabio Massimo Zanzotto
235
7
0
21 Sep 2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
International Conference on Learning Representations (ICLR), 2023
Lukas Berglund
Meg Tong
Max Kaufmann
Mikita Balesni
Asa Cooper Stickland
Tomasz Korbak
Owain Evans
LRM
512
399
0
21 Sep 2023
Knowledge Sanitization of Large Language Models
Yoichi Ishibashi
Hidetoshi Shimodaira
KELM
285
37
0
21 Sep 2023
Rigorously Assessing Natural Language Explanations of Neurons
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
241
40
0
19 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
226
17
0
16 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
298
29
0
14 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
International Conference on Learning Representations (ICLR), 2023
Angelica Chen
Ravid Schwartz-Ziv
Dong Wang
Matthew L. Leavitt
Naomi Saphra
577
105
0
13 Sep 2023
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Maximilian Li
Xander Davies
Max Nadeau
KELM
MU
306
34
0
12 Sep 2023
Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Daniel Grzenda
Nathaniel Hudson
André Bauer
Kyle Chard
Ian Foster
KELM
LRM
236
23
0
11 Sep 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
400
73
0
09 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
Neural Information Processing Systems (NeurIPS), 2023
Sarah Schwettmann
Tamar Rott Shaham
Joanna Materzyñska
Neil Chowdhury
Shuang Li
Jacob Andreas
David Bau
Antonio Torralba
262
31
0
07 Sep 2023
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
International Conference on Learning Representations (ICLR), 2023
Yung-Sung Chuang
Yujia Xie
Hongyin Luo
Yoon Kim
James R. Glass
Pengcheng He
HILM
287
288
0
07 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Computational Linguistics (CL), 2023
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Freda Shi
Shuming Shi
Shuming Shi
LRM
RALM
HILM
733
828
0
03 Sep 2023
Explainability for Large Language Models: A Survey
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
500
710
0
02 Sep 2023
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Neel Nanda
Andrew Lee
Martin Wattenberg
FAtt
MILM
316
249
0
02 Sep 2023
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
215
16
0
01 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
275
46
0
27 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
Lin Geng Foo
Hossein Rahmani
Jing Liu
770
49
0
27 Aug 2023
Unified Concept Editing in Diffusion Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Rohit Gandikota
Hadas Orgad
Yonatan Belinkov
Joanna Materzyñska
David Bau
DiffM
380
303
0
25 Aug 2023
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yuheng Chen
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
334
59
0
25 Aug 2023
Overcoming Generic Knowledge Loss with Selective Parameter Update
Computer Vision and Pattern Recognition (CVPR), 2023
Wenxuan Zhang
Paul Janson
Rahaf Aljundi
Mohamed Elhoseiny
KELM
CLL
380
20
0
23 Aug 2023
Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models
Neural Networks (Neural Netw.), 2023
Adrián Csiszárik
M. Kiss
Péter Korösi-Szabó
Márton Muntag
Gergely Papp
D. Varga
MoMe
183
1
0
22 Aug 2023
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
Yeqi Gao
Zhao Song
Junze Yin
179
22
0
21 Aug 2023
DocTER: Evaluating Document-based Knowledge Editing
Information Processing & Management (IPM), 2023
Suhang Wu
Minlong Peng
Minlong Peng
Y. Lin
Wenbo Li
Mingming Sun
Jinsong Su
KELM
302
39
0
19 Aug 2023
Linearity of Relation Decoding in Transformer Language Models
International Conference on Learning Representations (ICLR), 2023
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
335
140
0
17 Aug 2023
PMET: Precise Model Editing in a Transformer
AAAI Conference on Artificial Intelligence (AAAI), 2023
Xiaopeng Li
Shasha Li
Shezheng Song
Jing Yang
Jun Ma
Jie Yu
KELM
537
183
0
17 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
Hao Fei
KELM
MU
211
39
0
16 Aug 2023
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Peng Wang
Ningyu Zhang
Bo Tian
Zekun Xi
Yunzhi Yao
...
Shuyang Cheng
Kangwei Liu
Yuansheng Ni
Guozhou Zheng
Huajun Chen
KELM
258
82
0
14 Aug 2023
Explaining Relation Classification Models with Semantic Extents
Lars Klöser
André Büsgen
Philipp Kohl
Bodo Kraft
Albert Zündorf
123
1
0
04 Aug 2023
Multimodal Neurons in Pretrained Text-Only Transformers
Sarah Schwettmann
Neil Chowdhury
Samuel J. Klein
David Bau
Antonio Torralba
MILM
272
43
0
03 Aug 2023
Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI
Avijit Ghosh
D. Lakshmi
88
7
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
ACM Multimedia (ACM MM), 2023
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
178
18
0
02 Aug 2023
Previous
1
2
3
...
24
25
26
27
28
Next
Page 25 of 28
Page
of 28
Go