ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.15712
  4. Cited By
Infinite Limits of Multi-head Transformer Dynamics

Infinite Limits of Multi-head Transformer Dynamics

24 May 2024
Blake Bordelon
Hamza Tahir Chaudhry
C. Pehlevan
    AI4CE
ArXivPDFHTML

Papers citing "Infinite Limits of Multi-head Transformer Dynamics"

11 / 11 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
37
0
0
02 May 2025
Deep Neural Nets as Hamiltonians
Deep Neural Nets as Hamiltonians
Mike Winer
Boris Hanin
41
0
0
31 Mar 2025
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
39
1
0
24 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
55
1
0
28 Jan 2025
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
64
8
0
29 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
C. Pehlevan
43
2
0
06 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
C. Pehlevan
44
11
0
26 Sep 2024
A Dynamical Model of Neural Scaling Laws
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
C. Pehlevan
44
36
0
02 Feb 2024
The Influence of Learning Rule on Representation Dynamics in Wide Neural
  Networks
The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks
Blake Bordelon
C. Pehlevan
36
22
0
05 Oct 2022
SCENIC: A JAX Library for Computer Vision Research and Beyond
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
41
67
0
18 Oct 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
1