Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.18400
Cited By
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
26 June 2024
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers"
19 / 19 papers shown
Title
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
44
0
0
14 Mar 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
67
0
0
24 Feb 2025
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory
Shuo Wang
Issei Sato
69
0
0
16 Dec 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
1
0
22 Jul 2024
Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Taiming Lu
Muhan Gao
Kuai Yu
Adam Byerly
Daniel Khashabi
37
11
0
20 Jun 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
23
10
0
13 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
123
415
0
13 Mar 2024
On the Origins of Linear Representations in Large Language Models
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
Victor Veitch
59
24
0
06 Mar 2024
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
Arijit Ghosh Chowdhury
Md. Mofijul Islam
Vaibhav Kumar
F. H. Shezan
Vaibhav Kumar
Vinija Jain
Aman Chadha
AAML
PILM
21
4
0
03 Mar 2024
Learning Associative Memories with Gradient Descent
Vivien A. Cabannes
Berfin Simsek
A. Bietti
30
3
0
28 Feb 2024
InstructEdit: Instruction-based Knowledge Editing for Large Language Models
Ningyu Zhang
Bo Tian
Siyuan Cheng
Xiaozhuan Liang
Yi Hu
Kouying Xue
Yanjie Gou
Xi Chen
Huajun Chen
KELM
40
4
0
25 Feb 2024
Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models
Goutham Rajendran
Simon Buchholz
Bryon Aragam
Bernhard Schölkopf
Pradeep Ravikumar
AI4CE
83
19
0
14 Feb 2024
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
189
260
0
28 Apr 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
98
61
0
07 Mar 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
73
98
0
05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Fast Model Editing at Scale
E. Mitchell
Charles Lin
Antoine Bosselut
Chelsea Finn
Christopher D. Manning
KELM
219
341
0
21 Oct 2021
Sorting through the noise: Testing robustness of information processing in pre-trained language models
Lalchand Pandia
Allyson Ettinger
34
37
0
25 Sep 2021
1