ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07677
  4. Cited By
Transformers learn in-context by gradient descent
v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
    MLT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 455 papers shown
Title
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
Ameen Ali
Lior Wolf
Ivan Titov
151
6
0
02 Oct 2024
Sparse Autoencoders Reveal Temporal Difference Learning in Large
  Language Models
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Can Demircan
Tankred Saanum
Akshay K. Jagadish
Marcel Binz
Eric Schulz
170
11
0
02 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
688
7
0
02 Oct 2024
Transformers Handle Endogeneity in In-Context Linear Regression
Transformers Handle Endogeneity in In-Context Linear RegressionInternational Conference on Learning Representations (ICLR), 2024
Haodong Liang
Krishnakumar Balasubramanian
Lifeng Lai
472
4
0
02 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
446
1
0
02 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regressionInternational Conference on Learning Representations (ICLR), 2024
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
911
9
0
02 Oct 2024
"Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models
"Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language ModelsKnowledge Discovery and Data Mining (KDD), 2024
Ricardo Knauer
Mario Koddenbrock
Raphael Wallsberger
Nicholas M. Brisson
Georg N. Duda
Deborah Falla
David W. Evans
Erik Rodner
369
0
0
27 Sep 2024
Non-asymptotic Convergence of Training Transformers for Next-token
  Prediction
Non-asymptotic Convergence of Training Transformers for Next-token PredictionNeural Information Processing Systems (NeurIPS), 2024
Ruiquan Huang
Yingbin Liang
Jing Yang
230
10
0
25 Sep 2024
Generalization vs. Specialization under Concept Shift
Generalization vs. Specialization under Concept Shift
Alex Nguyen
David J. Schwab
Vudtiwat Ngampruetikorn
OOD
196
0
0
23 Sep 2024
Focused Large Language Models are Stable Many-Shot Learners
Focused Large Language Models are Stable Many-Shot LearnersConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Xinglin Wang
Y. Zhang
Chuyi Tan
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
210
6
0
26 Aug 2024
Multimodal Contrastive In-Context Learning
Multimodal Contrastive In-Context Learning
Yosuke Miyanishi
Minh Le Nguyen
209
2
0
23 Aug 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging
  Framework And Methods From Neuroscience
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
245
5
0
22 Aug 2024
Transformers are Minimax Optimal Nonparametric In-Context Learners
Transformers are Minimax Optimal Nonparametric In-Context LearnersNeural Information Processing Systems (NeurIPS), 2024
Juno Kim
Tai Nakamaki
Taiji Suzuki
277
26
0
22 Aug 2024
Learning Randomized Algorithms with Transformers
Learning Randomized Algorithms with TransformersInternational Conference on Learning Representations (ICLR), 2024
J. Oswald
Seijin Kobayashi
Yassir Akram
Angelika Steger
AAML
198
1
0
20 Aug 2024
In-Context Learning with Representations: Contextual Generalization of
  Trained Transformers
In-Context Learning with Representations: Contextual Generalization of Trained TransformersNeural Information Processing Systems (NeurIPS), 2024
Tong Yang
Yu Huang
Yingbin Liang
Yuejie Chi
MLT
265
27
0
19 Aug 2024
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
Chenming Tang
Zhixiang Wang
Hao Sun
Yunfang Wu
LRM
367
0
0
16 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A
  Case Study on Sparse Linear Regression
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear RegressionNeural Information Processing Systems (NeurIPS), 2024
Xingwu Chen
Lei Zhao
Difan Zou
172
15
0
08 Aug 2024
Pre-training and in-context learning IS Bayesian inference a la De
  Finetti
Pre-training and in-context learning IS Bayesian inference a la De Finetti
Naimeng Ye
Hanming Yang
Andrew Siah
Hongseok Namkoong
BDLUQLM
240
3
0
06 Aug 2024
Spin glass model of in-context learning
Spin glass model of in-context learningPhysical Review E (Phys. Rev. E), 2024
Yuhao Li
Ruoran Bai
Haiping Huang
LRM
412
1
0
05 Aug 2024
Intermittent Semi-working Mask: A New Masking Paradigm for LLMs
Intermittent Semi-working Mask: A New Masking Paradigm for LLMs
Mingcong Lu
Jiangcai Zhu
Wang Hao
Zheng Li
Shusheng Zhang
Kailai Shao
Chao Chen
Nan Li
Feng Wang
Xin Lu
145
0
0
01 Aug 2024
Do Large Language Models Have Compositional Ability? An Investigation
  into Limitations and Scalability
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu
Zhenmei Shi
Yingyu Liang
CoGeLRM
318
50
0
22 Jul 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
350
2
0
15 Jul 2024
Fine-grained Analysis of In-context Linear Estimation: Data,
  Architecture, and Beyond
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
208
16
0
13 Jul 2024
ICLGuard: Controlling In-Context Learning Behavior for Applicability
  Authorization
ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Wai Man Si
Michael Backes
Yang Zhang
149
1
0
09 Jul 2024
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Yongqi Leng
Deyi Xiong
348
15
0
09 Jul 2024
Expressivity of Neural Networks with Random Weights and Learned Biases
Expressivity of Neural Networks with Random Weights and Learned Biases
Ezekiel Williams
Avery Hee-Woon Ryoo
Thomas Jiralerspong
Alexandre Payeur
M. Perich
Luca Mazzucato
Guillaume Lajoie
339
4
0
01 Jul 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
188
2
0
24 Jun 2024
Distributed Rule Vectors is A Key Mechanism in Large Language Models'
  In-Context Learning
Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning
Bowen Zheng
Ming Ma
Zhongqiao Lin
Tianming Yang
194
4
0
23 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Robert Bamler
ReLMLRM
280
24
0
20 Jun 2024
Safety Arithmetic: A Framework for Test-time Safety Alignment of
  Language Models by Steering Parameters and Activations
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
Rima Hazra
Sayan Layek
Somnath Banerjee
Soujanya Poria
KELMLLMSV
228
22
0
17 Jun 2024
Relational Learning in Pre-Trained Models: A Theory from Hypergraph
  Recovery Perspective
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
Yang Chen
Cong Fang
Zhouchen Lin
Bing Liu
195
2
0
17 Jun 2024
Probing the Decision Boundaries of In-context Learning in Large Language
  Models
Probing the Decision Boundaries of In-context Learning in Large Language Models
Siyan Zhao
Tung Nguyen
Aditya Grover
331
18
0
17 Jun 2024
Separations in the Representational Capabilities of Transformers and
  Recurrent Architectures
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
222
19
0
13 Jun 2024
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin
Ilya Zisman
Alexey Zemtsov
Viacheslav Sinii
447
11
0
13 Jun 2024
Tokenize features, enhancing tables: the FT-TABPFN model for tabular
  classification
Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification
Quangao Liu
Wei Yang
Chen Liang
Longlong Pang
Zhuozhang Zou
225
3
0
11 Jun 2024
Estimating the Hallucination Rate of Generative AI
Estimating the Hallucination Rate of Generative AI
Andrew Jesson
Nicolas Beltran-Velez
Quentin Chu
Sweta Karlekar
Jannik Kossen
Yarin Gal
John P. Cunningham
David M. Blei
423
27
0
11 Jun 2024
What Do Language Models Learn in Context? The Structured Task Hypothesis
What Do Language Models Learn in Context? The Structured Task HypothesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiaoda Li
Buse Giledereli
Mrinmaya Sachan
Robert Bamler
LRM
275
14
0
06 Jun 2024
On Limitation of Transformer for Learning HMMs
On Limitation of Transformer for Learning HMMs
Jiachen Hu
Qinghua Liu
Chi Jin
206
7
0
06 Jun 2024
Enhancing In-Context Learning Performance with just SVD-Based Weight
  Pruning: A Theoretical Perspective
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective
Xinhao Yao
Xiaolin Hu
Shenzhi Yang
Yong Liu
192
3
0
06 Jun 2024
Pre-trained Large Language Models Use Fourier Features to Compute
  Addition
Pre-trained Large Language Models Use Fourier Features to Compute Addition
Tianyi Zhou
Deqing Fu
Willie Neiswanger
Robin Jia
LRM
203
27
0
05 Jun 2024
Explicitly Encoding Structural Symmetry is Key to Length Generalization
  in Arithmetic Tasks
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
Mahdi Sabbaghi
George Pappas
Hamed Hassani
Surbhi Goel
225
8
0
04 Jun 2024
Universal In-Context Approximation By Prompting Fully Recurrent Models
Universal In-Context Approximation By Prompting Fully Recurrent Models
Aleksandar Petrov
Tom A. Lamb
Alasdair Paren
Juil Sock
Adel Bibi
LRM
139
0
0
03 Jun 2024
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
Shicheng Xu
Liang Pang
Huawei Shen
Xueqi Cheng
84
0
0
03 Jun 2024
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
Shreyas Kapur
Vasil Georgiev
Cameron Allen
Scott Emmons
Stuart J. Russell
272
20
0
02 Jun 2024
How In-Context Learning Emerges from Training on Unstructured Data: On
  the Role of Co-Occurrence, Positional Information, and Noise Structures
How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures
Kevin Christian Wibisono
Yixin Wang
96
4
0
31 May 2024
Why Larger Language Models Do In-context Learning Differently?
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
219
43
0
30 May 2024
Does learning the right latent variables necessarily improve in-context learning?
Does learning the right latent variables necessarily improve in-context learning?
Sarthak Mittal
Eric Elmoznino
Léo Gagnon
Sangnie Bhardwaj
Tom Marty
Dhanya Sridhar
Guillaume Lajoie
323
8
0
29 May 2024
A Theoretical Understanding of Self-Correction through In-context
  Alignment
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang
Yuyang Wu
Zeming Wei
Stefanie Jegelka
Yisen Wang
LRM
222
51
0
28 May 2024
IM-Context: In-Context Learning for Imbalanced Regression Tasks
IM-Context: In-Context Learning for Imbalanced Regression Tasks
Ismail Nejjar
Faez Ahmed
Olga Fink
225
5
0
28 May 2024
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence
  and Capability
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
191
6
0
27 May 2024
Previous
123456...8910
Next