ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.03466
  4. Cited By
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

7 March 2022
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
ArXivPDFHTML

Papers citing "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer"

22 / 122 papers shown
Title
A Kernel-Based View of Language Model Fine-Tuning
A Kernel-Based View of Language Model Fine-Tuning
Sadhika Malladi
Alexander Wettig
Dingli Yu
Danqi Chen
Sanjeev Arora
VLM
66
60
0
11 Oct 2022
Multi-step Planning for Automated Hyperparameter Optimization with
  OptFormer
Multi-step Planning for Automated Hyperparameter Optimization with OptFormer
Lucio Dery
A. Friesen
Nando de Freitas
MarcÁurelio Ranzato
Yutian Chen
23
0
0
10 Oct 2022
Meta-Principled Family of Hyperparameter Scaling Strategies
Meta-Principled Family of Hyperparameter Scaling Strategies
Sho Yaida
50
16
0
10 Oct 2022
Second-order regression models exhibit progressive sharpening to the
  edge of stability
Second-order regression models exhibit progressive sharpening to the edge of stability
Atish Agarwala
Fabian Pedregosa
Jeffrey Pennington
25
26
0
10 Oct 2022
Joint Embedding Self-Supervised Learning in the Kernel Regime
Joint Embedding Self-Supervised Learning in the Kernel Regime
B. Kiani
Randall Balestriero
Yubei Chen
S. Lloyd
Yann LeCun
SSL
30
13
0
29 Sep 2022
The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at
  Initialization
The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization
Mufan Bill Li
Mihai Nica
Daniel M. Roy
17
36
0
06 Jun 2022
Dataset Distillation using Neural Feature Regression
Dataset Distillation using Neural Feature Regression
Yongchao Zhou
E. Nezhadarya
Jimmy Ba
DD
FedML
16
149
0
01 Jun 2022
A Framework for Overparameterized Learning
A Framework for Overparameterized Learning
Dávid Terjék
Diego González-Sánchez
MLT
11
1
0
26 May 2022
A Case of Exponential Convergence Rates for SVM
A Case of Exponential Convergence Rates for SVM
Vivien A. Cabannes
S. Vigogna
9
1
0
20 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
16
795
0
14 Apr 2022
GemNet-OC: Developing Graph Neural Networks for Large and Diverse
  Molecular Simulation Datasets
GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets
Johannes Gasteiger
Muhammed Shuaibi
Anuroop Sriram
Stephan Günnemann
Zachary W. Ulissi
C. L. Zitnick
Abhishek Das
AI4TS
MLAU
23
64
0
06 Apr 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
11
1,806
0
29 Mar 2022
Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for
  Optimization
Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization
G. Luca
E. Silverstein
38
10
0
26 Jan 2022
Hydra: A System for Large Multi-Model Deep Learning
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
22
5
0
16 Oct 2021
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup
  for Training GPT Models
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models
Conglong Li
Minjia Zhang
Yuxiong He
4
37
0
13 Aug 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,592
0
04 May 2021
Learning by Turning: Neural Architecture Aware Optimisation
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
39
26
0
14 Feb 2021
Temperature check: theory and practice for training models with
  softmax-cross-entropy losses
Temperature check: theory and practice for training models with softmax-cross-entropy losses
Atish Agarwala
Jeffrey Pennington
Yann N. Dauphin
S. Schoenholz
UQCV
8
32
0
14 Oct 2020
On the distance between two neural networks and the stability of
  learning
On the distance between two neural networks and the stability of learning
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Ming-Yu Liu
ODL
190
57
0
09 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
123