ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16845
  4. Cited By
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence
  and Capability

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

27 May 2024
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
ArXivPDFHTML

Papers citing "On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability"

8 / 8 papers shown
Title
How do Transformers perform In-Context Autoregressive Learning?
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
16
7
0
08 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
34
12
0
30 Jan 2024
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear
  Regression?
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Peter L. Bartlett
116
48
0
12 Oct 2023
The Learnability of In-Context Learning
The Learnability of In-Context Learning
Noam Wies
Yoav Levine
Amnon Shashua
114
89
0
14 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
107
61
0
07 Mar 2023
Autoregressive Image Generation using Residual Quantization
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
168
324
0
03 Mar 2022
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
119
1,190
0
16 Aug 2016
1