Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.04532
Cited By
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
8 August 2024
Xingwu Chen
Lei Zhao
Difan Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression"
6 / 6 papers shown
Title
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
20
0
0
17 Oct 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
21
12
0
30 Jan 2024
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Peter L. Bartlett
107
48
0
12 Oct 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
98
61
0
07 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Probing for Bridging Inference in Transformer Language Models
Onkar Pandit
Yufang Hou
39
11
0
19 Apr 2021
1