ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.15607
  4. Cited By
How Do Nonlinear Transformers Learn and Generalize in In-Context
  Learning?
v1v2v3 (latest)

How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

23 February 2024
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
    MLT
ArXiv (abs)PDFHTML

Papers citing "How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?"

18 / 18 papers shown
Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems
Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems
Hongbo Li
Qinhang Wu
Sen-Fon Lin
Yingbin Liang
Ness B. Shroff
MoE
181
0
0
30 Oct 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Yanna Ding
Songtao Lu
Yingdong Lu
T. Nowicki
Jianxi Gao
265
0
0
21 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
231
1
0
14 Oct 2025
Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent
Softmax ≥\geq≥ Linear: Transformers may learn to classify in-context by kernel gradient descent
Sara Dragutinovic
Andrew Saxe
Aaditya K. Singh
MLT
178
2
0
12 Oct 2025
Multi-Layer Attention is the Amplifier of Demonstration Effectiveness
Multi-Layer Attention is the Amplifier of Demonstration Effectiveness
Dingzirui Wang
Xuangliang Zhang
Keyan Xu
Qingfu Zhu
Wanxiang Che
Yang Deng
173
1
0
01 Aug 2025
To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA
To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA
Shugang Hao
Hongbo Li
Lingjie Duan
276
0
0
31 Jul 2025
Provable In-Context Learning of Nonlinear Regression with Transformers
Provable In-Context Learning of Nonlinear Regression with Transformers
Hongbo Li
Lingjie Duan
Yingbin Liang
238
3
0
28 Jul 2025
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data
Bingjie Zhang
Hongkang Li
Changlong Shi
Guowei Rong
He Zhao
Dongsheng Wang
Dandan Guo
Meng Wang
MoMe
326
0
0
10 Jun 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear TransformersInternational Conference on Learning Representations (ICLR), 2025
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
872
28
0
15 Apr 2025
In-Context Learning with Hypothesis-Class Guidance
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
498
0
0
27 Feb 2025
Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting
Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting
Yingying Zhang
Zhikai Wu
Jian Li
Wenshu Fan
MLTAI4CE
228
2
0
18 Feb 2025
Evaluating the Prompt Steerability of Large Language Models
Evaluating the Prompt Steerability of Large Language Models
Erik Miehling
Michael Desmond
Karthikeyan N. Ramamurthy
Elizabeth M. Daly
Pierre Dognin
Jesus Rios
Djallel Bouneffouf
Miao Liu
LLMSV
461
16
0
19 Nov 2024
AERO: Entropy-Guided Framework for Private LLM Inference
AERO: Entropy-Guided Framework for Private LLM Inference
N. Jha
Brandon Reagen
505
5
0
16 Oct 2024
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Can In-context Learning Really Generalize to Out-of-distribution Tasks?International Conference on Learning Representations (ICLR), 2024
Qixun Wang
Yifei Wang
Yisen Wang
Xianghua Ying
OOD
310
18
0
13 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization AnalysisInternational Conference on Learning Representations (ICLR), 2024
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
554
12
0
03 Oct 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
495
7
0
03 Sep 2024
What Improves the Generalization of Graph Transformers? A Theoretical
  Dive into the Self-attention and Positional Encoding
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Hongkang Li
Meng Wang
Tengfei Ma
Sijia Liu
Zaixi Zhang
Pin-Yu Chen
MLTAI4CE
339
19
0
04 Jun 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse
  Mixture-of-Experts
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
Kaoutar El Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
441
11
0
26 May 2024
1
Page 1 of 1