533
v1v2v3v4 (latest)

Re-examining learning linear functions in context

Deutsche Jahrestagung für Künstliche Intelligenz (KI), 2024
Abstract

In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks. However, our understanding of how ICL works remains limited. We explore a simple model of ICL in a controlled setup with synthetic training data to investigate ICL of univariate linear functions. We experiment with a range of GPT-2-like transformer models trained from scratch. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to learn a linear function in-context. These models fail to generalize beyond their training distribution, highlighting fundamental limitations in their capacity to infer abstract task structures. Our experiments lead us to propose a mathematically precise hypothesis of what the model might be learning.

View on arXiv
Main:11 Pages
10 Figures
Bibliography:3 Pages
5 Tables
Appendix:6 Pages
Comments on this paper