ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.03466
  4. Cited By
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

7 March 2022
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
ArXivPDFHTML

Papers citing "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer"

50 / 122 papers shown
Title
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
103
40
0
13 Mar 2024
Principled Architecture-aware Scaling of Hyperparameters
Principled Architecture-aware Scaling of Hyperparameters
Wuyang Chen
Junru Wu
Zhangyang Wang
Boris Hanin
AI4CE
28
1
0
27 Feb 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
21
39
0
26 Feb 2024
Quantum linear algebra is all you need for Transformer architectures
Quantum linear algebra is all you need for Transformer architectures
Naixu Guo
Zhan Yu
Matthew Choi
Aman Agrawal
Kouhei Nakaji
Alán Aspuru-Guzik
P. Rebentrost
AI4CE
28
14
0
26 Feb 2024
LoRA+: Efficient Low Rank Adaptation of Large Models
LoRA+: Efficient Low Rank Adaptation of Large Models
Soufiane Hayou
Nikhil Ghosh
Bin Yu
AI4CE
19
137
0
19 Feb 2024
Estimating the Local Learning Coefficient at Scale
Estimating the Local Learning Coefficient at Scale
Zach Furman
Edmund Lau
17
3
0
06 Feb 2024
Unified Training of Universal Time Series Forecasting Transformers
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo
Chenghao Liu
Akshat Kumar
Caiming Xiong
Silvio Savarese
Doyen Sahoo
AI4TS
107
152
0
04 Feb 2024
DsDm: Model-Aware Dataset Selection with Datamodels
DsDm: Model-Aware Dataset Selection with Datamodels
Logan Engstrom
Axel Feldmann
A. Madry
OODD
10
45
0
23 Jan 2024
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
W. Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric P. Xing
38
68
0
11 Dec 2023
DiSK: A Diffusion Model for Structured Knowledge
DiSK: A Diffusion Model for Structured Knowledge
O. Kitouni
Niklas Nolte
James Hensman
Bhaskar Mitra
DiffM
17
3
0
08 Dec 2023
Scaling Laws in Jet Classification
Scaling Laws in Jet Classification
Joshua D. Batson
Yonatan Kahn
11
2
0
04 Dec 2023
More is Better in Modern Machine Learning: when Infinite
  Overparameterization is Optimal and Overfitting is Obligatory
More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
James B. Simon
Dhruva Karkada
Nikhil Ghosh
Mikhail Belkin
AI4CE
BDL
23
13
0
24 Nov 2023
A Spectral Condition for Feature Learning
A Spectral Condition for Feature Learning
Greg Yang
James B. Simon
Jeremy Bernstein
20
25
0
26 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
26
22
0
11 Oct 2023
Towards Foundational Models for Molecular Learning on Large-Scale
  Multi-Task Datasets
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Dominique Beaini
Shenyang Huang
Joao Alex Cunha
Zhiyi Li
Gabriela Moisescu-Pareja
...
Thérence Bois
Andrew Fitzgibbon
Bla.zej Banaszewski
Chad Martin
Dominic Masters
AI4CE
26
19
0
06 Oct 2023
Predicting Emergent Abilities with Infinite Resolution Evaluation
Predicting Emergent Abilities with Infinite Resolution Evaluation
Shengding Hu
Xin Liu
Xu Han
Xinrong Zhang
Chaoqun He
...
Ning Ding
Zebin Ou
Guoyang Zeng
Zhiyuan Liu
Maosong Sun
ELM
LRM
15
13
0
05 Oct 2023
Scaling Laws for Associative Memories
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
11
19
0
04 Oct 2023
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
Greg Yang
Dingli Yu
Chen Zhu
Soufiane Hayou
MLT
8
28
0
03 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
16
35
0
01 Oct 2023
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and
  Scaling Limit
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon
Lorenzo Noci
Mufan Bill Li
Boris Hanin
C. Pehlevan
14
22
0
28 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
30
80
0
25 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
LI DU
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
55
21
0
07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
43
7
0
07 Sep 2023
Continual Pre-Training of Large Language Models: How to (re)warm your
  model?
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
22
98
0
08 Aug 2023
Large Language Models
Large Language Models
Michael R Douglas
LLMAG
LM&MA
22
547
0
11 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width
  Limit
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Bill Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
13
29
0
30 Jun 2023
Birth of a Transformer: A Memory Viewpoint
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
16
80
0
01 Jun 2023
Improving Energy Conserving Descent for Machine Learning: Theory and
  Practice
Improving Energy Conserving Descent for Machine Learning: Theory and Practice
G. Luca
Alice Gatti
E. Silverstein
10
1
0
01 Jun 2023
Likelihood-Based Diffusion Language Models
Likelihood-Based Diffusion Language Models
Ishaan Gulrajani
Tatsunori B. Hashimoto
DiffM
13
50
0
30 May 2023
A Rainbow in Deep Network Black Boxes
A Rainbow in Deep Network Black Boxes
Florentin Guth
Brice Ménard
G. Rochette
S. Mallat
6
10
0
29 May 2023
Learning Capacity: A Measure of the Effective Dimensionality of a Model
Learning Capacity: A Measure of the Effective Dimensionality of a Model
Daiwei Chen
Wei-Di Chang
Pratik Chaudhari
22
2
0
27 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
18
70
0
25 May 2023
Depth Dependence of $μ$P Learning Rates in ReLU MLPs
Depth Dependence of μμμP Learning Rates in ReLU MLPs
Samy Jelassi
Boris Hanin
Ziwei Ji
Sashank J. Reddi
Srinadh Bhojanapalli
Surinder Kumar
6
7
0
13 May 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss
  Prediction across Scales
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Yiqun Yao
Siqi Fan
Xiusheng Huang
Xuezhi Fang
Xiang Li
...
Peng Han
Shuo Shang
Kang Liu
Aixin Sun
Yequan Wang
6
6
0
14 Apr 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
11
17
0
11 Apr 2023
Effective Theory of Transformers at Initialization
Effective Theory of Transformers at Initialization
Emily Dinan
Sho Yaida
Susan Zhang
20
14
0
04 Apr 2023
Scaling Expert Language Models with Unsupervised Domain Discovery
Scaling Expert Language Models with Unsupervised Domain Discovery
Suchin Gururangan
Margaret Li
M. Lewis
Weijia Shi
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoE
10
46
0
24 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
Unit Scaling: Out-of-the-Box Low-Precision Training
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
8
7
0
20 Mar 2023
Stabilizing Transformer Training by Preventing Attention Entropy
  Collapse
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
38
64
0
11 Mar 2023
How to DP-fy ML: A Practical Guide to Machine Learning with Differential
  Privacy
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy
Natalia Ponomareva
Hussein Hazimeh
Alexey Kurakin
Zheng Xu
Carson E. Denison
H. B. McMahan
Sergei Vassilvitskii
Steve Chien
Abhradeep Thakurta
94
165
0
01 Mar 2023
Phase diagram of early training dynamics in deep neural networks: effect
  of the learning rate, depth, and width
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
Dayal Singh Kalra
M. Barkeshli
13
9
0
23 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
Agustinus Kristiadi
Felix Dangel
Philipp Hennig
11
10
0
14 Feb 2023
PyGlove: Efficiently Exchanging ML Ideas as Code
PyGlove: Efficiently Exchanging ML Ideas as Code
Daiyi Peng
Xuanyi Dong
Esteban Real
Yifeng Lu
Quoc V. Le
11
0
0
03 Feb 2023
Width and Depth Limits Commute in Residual Networks
Width and Depth Limits Commute in Residual Networks
Soufiane Hayou
Greg Yang
42
14
0
01 Feb 2023
Scaling Laws for Hyperparameter Optimization
Scaling Laws for Hyperparameter Optimization
Arlind Kadra
Maciej Janowski
Martin Wistuba
Josif Grabocka
15
8
0
01 Feb 2023
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and
  Training Efficiency via Efficient Data Sampling and Routing
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
17
23
0
07 Dec 2022
Flatter, faster: scaling momentum for optimal speedup of SGD
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
47
5
0
28 Oct 2022
Will we run out of data? Limits of LLM scaling based on human-generated
  data
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
28
106
0
26 Oct 2022
Optimisation & Generalisation in Networks of Neurons
Optimisation & Generalisation in Networks of Neurons
Jeremy Bernstein
AI4CE
16
2
0
18 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
19
47
0
13 Oct 2022
Previous
123
Next