Transformers generalize differently from information stored in context
vs in weights

Transformers generalize differently from information stored in context vs in weights

11 October 2022

Stephanie C. Y. Chan

Ishita Dasgupta

Andrew Kyle Lampinen

Papers citing "Transformers generalize differently from information stored in context vs in weights"

9 / 9 papers shown

Title
On the generalization of language models from in-context learning and finetuning: a controlled study Andrew Kyle Lampinen Arslan Chaudhry Stephanie Chan Cody Wild Diane Wan Alex Ku Jorg Bornschein Razvan Pascanu Murray Shanahan James L. McClelland 46 0 0 01 May 2025
Toward Understanding In-context vs. In-weight Learning Bryan Chan Xinyi Chen András Gyorgy Dale Schuurmans 56 3 0 30 Oct 2024
When does compositional structure yield compositional generalization? A kernel theory Samuel Lippl Kim Stachenfeld NAI CoGe 54 5 0 26 May 2024
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History Akash Gupta Ivaxi Sheth Vyas Raina Mark J. F. Gales Mario Fritz 22 4 0 28 Feb 2024
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections Lihan Zha Yuchen Cui Li-Heng Lin Minae Kwon Montse Gonzalez Arenas Andy Zeng Fei Xia Dorsa Sadigh 8 34 0 17 Nov 2023
In-Context Learning Learns Label Relationships but Is Not Conventional Learning Jannik Kossen Y. Gal Tom Rainforth 12 27 0 23 Jul 2023
Large Language Models as General Pattern Machines Suvir Mirchandani F. Xia Peter R. Florence Brian Ichter Danny Driess Montse Gonzalez Arenas Kanishka Rao Dorsa Sadigh Andy Zeng LLMAG 23 183 0 10 Jul 2023
Passive learning of active causal strategies in agents and language models Andrew Kyle Lampinen Stephanie C. Y. Chan Ishita Dasgupta A. Nam Jane X. Wang 14 15 0 25 May 2023
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 275 3,784 0 18 Apr 2021