ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07677
  4. Cited By
Transformers learn in-context by gradient descent
v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
    MLT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 456 papers shown
Title
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
Patrick Kahardipraja
Reduan Achtibat
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
321
4
0
21 May 2025
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
Quan Nguyen
Thanh Nguyen-Tang
MLT
366
1
0
21 May 2025
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
Yukun Zhao
Lingyong Yan
Zhenyang Li
Shuaiqiang Wang
Zhumin Chen
Zhaochun Ren
Dawei Yin
CLLKELMVLMLRM
242
0
0
21 May 2025
Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex
Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex
Muquan Yu
Mu Nan
Hossein Adeli
Jacob S. Prince
John A. Pyles
Leila Wehbe
Margaret M. Henderson
Michael J. Tarr
Andrew F. Luo
MedImViT
229
0
0
21 May 2025
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective
Soo Min Kwon
Alec S. Xu
Can Yaras
Laura Balzano
Qing Qu
OOD
218
1
0
20 May 2025
Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners
Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
466
1
0
20 May 2025
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics
Christoph Jürgen Hemmer
Daniel Durstewitz
AI4TSSyDaAI4CE
523
4
0
19 May 2025
Attention-based clustering
Attention-based clustering
Rodrigo Maulen-Soto
Claire Boyer
Pierre Marion
296
0
0
19 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
308
23
0
17 May 2025
Do different prompting methods yield a common task representation in language models?
Do different prompting methods yield a common task representation in language models?
Guy Davidson
Todd M. Gureckis
Brenden M. Lake
Adina Williams
339
4
0
17 May 2025
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Jingcheng Niu
Subhabrata Dutta
Ahmed Elshabrawy
Harish Tayyar Madabushi
Iryna Gurevych
546
2
0
16 May 2025
Permutation Randomization on Nonsmooth Nonconvex Optimization: A Theoretical and Experimental Study
Permutation Randomization on Nonsmooth Nonconvex Optimization: A Theoretical and Experimental Study
Wei Zhang
Arif Hassan Zidan
Arif Hassan Zidan
Wei Zhang
Tianming Liu
164
0
0
16 May 2025
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
Yuanzhao Zhang
William Gilpin
AI4TS
207
1
0
16 May 2025
Big Data and the Computational Social Science of Entrepreneurship and Innovation
Big Data and the Computational Social Science of Entrepreneurship and Innovation
Ningzi Li
Shiyang Lai
James Evans
AILaw
190
0
0
13 May 2025
Rethinking Invariance in In-context Learning
Rethinking Invariance in In-context LearningInternational Conference on Learning Representations (ICLR), 2025
Lizhe Fang
Yifei Wang
Khashayar Gatmiry
Lei Fang
Yun Wang
315
8
0
08 May 2025
Understanding In-context Learning of Addition via Activation Subspaces
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu
Kayo Yin
Michael I. Jordan
Jacob Steinhardt
Lijie Chen
392
6
0
08 May 2025
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai
Zhiqiang Xie
Kayvon Fatahalian
LLMAG
436
5
0
01 May 2025
On the generalization of language models from in-context learning and finetuning: a controlled study
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Kyle Lampinen
Arslan Chaudhry
Stephanie Chan
Cody Wild
Diane Wan
Alex Ku
Jorg Bornschein
Razvan Pascanu
Murray Shanahan
James L. McClelland
637
20
0
01 May 2025
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context LearningComputer Vision and Pattern Recognition (CVPR), 2025
Jiadong Wang
Tianci Luo
Yaohua Zha
Yan Feng
Ruisheng Luo
Bin Chen
Tao Dai
Long Chen
Yaowei Wang
Shu-Tao Xia
VLM
272
0
0
30 Apr 2025
In-Context Learning can distort the relationship between sequence likelihoods and biological fitness
In-Context Learning can distort the relationship between sequence likelihoods and biological fitness
Pranav Kantroo
Günter P. Wagner
Benjamin B. Machta
320
0
0
23 Apr 2025
From predictions to confidence intervals: an empirical study of conformal prediction methods for in-context learning
From predictions to confidence intervals: an empirical study of conformal prediction methods for in-context learningSymposium on Advances in Approximate Bayesian Inference (AABI), 2025
Zhe Huang
Simone Rossi
Rui Yuan
T. Hannagan
223
2
0
22 Apr 2025
Scaling sparse feature circuit finding for in-context learning
Scaling sparse feature circuit finding for in-context learning
Dmitrii Kharlapenko
Shivalika Singh
Fazl Barez
Arthur Conmy
Neel Nanda
272
3
0
18 Apr 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
269
2
0
17 Apr 2025
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun
Y. Gai
Lijie Chen
Abhilasha Ravichander
Yejin Choi
Basel Alomair
HILM
324
10
0
17 Apr 2025
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
177
1
0
16 Apr 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear TransformersInternational Conference on Learning Representations (ICLR), 2025
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
733
18
0
15 Apr 2025
Long Context In-Context Compression by Getting to the Gist of Gisting
Long Context In-Context Compression by Getting to the Gist of Gisting
Aleksandar Petrov
Mark Sandler
A. Zhmoginov
Nolan Miller
Max Vladymyrov
271
2
0
11 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
167
4
0
06 Apr 2025
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent
Yi Xu
Weicong Qin
Weijie Yu
Ming He
Jianping Fan
Jun Xu
201
3
0
06 Apr 2025
An extension of linear self-attention for in-context learning
An extension of linear self-attention for in-context learning
Katsuyuki Hagiwara
217
0
0
31 Mar 2025
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova
Yana Veitsman
Xinting Huang
Michael Hahn
270
6
0
31 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLMELMLRM
963
5
0
28 Mar 2025
Decision Feedback In-Context Learning for Wireless Symbol Detection
Decision Feedback In-Context Learning for Wireless Symbol Detection
Li Fan
Jing Yang
Jing Yang
Cong Shen
418
0
0
20 Mar 2025
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Jiangxuan Long
Zhao Song
Chiwun Yang
AI4TS
894
2
0
18 Mar 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
315
3
0
14 Mar 2025
Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows
Chengyue Gong
Xiaoyu Li
Yingyu Liang
Jiangxuan Long
Zhenmei Shi
Zhao Song
Yu Tian
255
9
0
12 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
270
3
0
03 Mar 2025
Provable Benefits of Task-Specific Prompts for In-context LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Xiangyu Chang
Yingcong Li
Muti Kara
Samet Oymak
Amit K. Roy-Chowdhury
330
1
0
03 Mar 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuningInternational Conference on Learning Representations (ICLR), 2025
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
396
7
0
03 Mar 2025
Learning to Substitute Components for Compositional Generalization
Learning to Substitute Components for Compositional Generalization
Hao Sun
Gangwei Jiang
Chenwang Wu
Ying Wei
Defu Lian
Tong Xu
280
0
0
28 Feb 2025
In-Context Learning with Hypothesis-Class Guidance
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
453
0
0
27 Feb 2025
Ask, and it shall be given: On the Turing completeness of prompting
Ask, and it shall be given: On the Turing completeness of promptingInternational Conference on Learning Representations (ICLR), 2024
Ruizhong Qiu
Zhe Xu
Wenxuan Bao
Hanghang Tong
ReLMLRMAI4CE
364
0
0
24 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
360
4
0
24 Feb 2025
In-context Learning of Evolving Data Streams with Tabular Foundational Models
In-context Learning of Evolving Data Streams with Tabular Foundational Models
Afonso Lourenço
João Gama
Eric P. Xing
Goreti Marreiros
395
0
0
24 Feb 2025
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from GeneralizationInternational Conference on Learning Representations (ICLR), 2025
Zixuan Gong
Xiaolin Hu
Huayi Tang
Yong Liu
316
2
0
24 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
365
3
0
24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable ComputersInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
572
22
0
21 Feb 2025
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context DemonstrationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Vignesh Kothapalli
Hamed Firooz
Maziar Sanjabi
395
0
0
21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Vector-ICL: In-context Learning with Continuous Vector RepresentationsInternational Conference on Learning Representations (ICLR), 2024
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
404
10
0
21 Feb 2025
In-Context Parametric Inference: Point or Distribution Estimators?
In-Context Parametric Inference: Point or Distribution Estimators?
Sarthak Mittal
Yoshua Bengio
Nikolay Malkin
Guillaume Lajoie
269
1
0
17 Feb 2025
Previous
123456...8910
Next