ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.03466
  4. Cited By
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
v1v2 (latest)

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

7 March 2022
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (1523★)

Papers citing "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer"

50 / 150 papers shown
Auxiliary-Hyperparameter-Free Sampling: Entropy Equilibrium for Text Generation
Auxiliary-Hyperparameter-Free Sampling: Entropy Equilibrium for Text Generation
Xiaodong Cai
Hai Lin
Shaoxiong Zhan
Weiqi Luo
Hong-Gee Kim
Hongyan Hao
Yu Yang
Hai-Tao Zheng
114
0
0
30 Nov 2025
Controlling changes to attention logits
Controlling changes to attention logits
Ben Anson
Laurence Aitchison
225
0
0
26 Nov 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu
Xiaolong Zhong
Ling Jiang
LLMAGMUMoELRM
420
0
0
23 Nov 2025
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Zhiqi Bu
AI4CE
164
0
0
07 Nov 2025
A Proof of Learning Rate Transfer under $μ$P
A Proof of Learning Rate Transfer under μμμP
Soufiane Hayou
MLT
190
1
0
03 Nov 2025
Quantitative Bounds for Length Generalization in Transformers
Quantitative Bounds for Length Generalization in Transformers
Zachary Izzo
Eshaan Nichani
Jason D. Lee
298
5
0
30 Oct 2025
Zero-Shot Performance Prediction for Probabilistic Scaling Laws
Zero-Shot Performance Prediction for Probabilistic Scaling Laws
Viktoria Schram
Markus Hiller
Daniel Beck
Trevor Cohn
170
0
0
19 Oct 2025
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Zhiyuan Fan
Yifeng Liu
Qingyue Zhao
Angela Yuan
Quanquan Gu
141
3
0
17 Oct 2025
Spectral Alignment as Predictor of Loss Explosion in Neural Network Training
Spectral Alignment as Predictor of Loss Explosion in Neural Network Training
Haiquan Qiu
You Wu
Yingjie Tan
Yaqing Wang
Quanming Yao
142
0
0
05 Oct 2025
Arithmetic-Mean $μ$P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
Arithmetic-Mean μμμP for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
Haosong Zhang
Shenxi Wu
Yichi Zhang
Wei Lin
W. Lin
275
0
0
05 Oct 2025
Optimal Scaling Needs Optimal Norm
Optimal Scaling Needs Optimal Norm
Oleg Filatov
Jiangtao Wang
J. Ebert
Stefan Kesselheim
238
3
0
04 Oct 2025
Muon: Training and Trade-offs with Latent Attention and MoE
Muon: Training and Trade-offs with Latent Attention and MoE
Sushant Mehta
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
160
0
0
29 Sep 2025
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Sebastian Bordt
Martin Pawelczyk
CLL
259
2
0
27 Sep 2025
Pre-training under infinite compute
Pre-training under infinite compute
Konwoo Kim
Suhas Kotha
Abigail Z. Jacobs
Tatsunori Hashimoto
306
11
0
18 Sep 2025
Deep Learning-Driven Peptide Classification in Biological Nanopores
Deep Learning-Driven Peptide Classification in Biological Nanopores
S. Tovey
Julian Hoßbach
Sandro Kuppel
Tobias Ensslen
Jan C. Behrends
Christian Holm
133
0
0
17 Sep 2025
LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization
LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization
Yunfei Teng
Sixin Zhang
204
0
0
03 Sep 2025
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
David K. Park
Shuhang Li
Y. Huang
Xihaier Luo
Haiwang Yu
...
Lu Ma
Shinjae Yoo
Joseph Osborn
Jin-zhi Huang
Zhongjing Jiang
AI4CE
133
2
0
13 Aug 2025
Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
Geometry of Neural Reinforcement Learning in Continuous State and Action SpacesInternational Conference on Learning Representations (ICLR), 2025
Saket Tiwari
Omer Gottesman
George Konidaris
377
3
0
28 Jul 2025
What Can Grokking Teach Us About Learning Under Nonstationarity?
What Can Grokking Teach Us About Learning Under Nonstationarity?
Clare Lyle
Gharda Sokar
Razvan Pascanu
András Gyorgy
236
4
0
26 Jul 2025
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Chenyang Song
Weilin Zhao
Xu Han
Chaojun Xiao
Yingfa Chen
Yuxuan Li
Zhiyuan Liu
Maosong Sun
MoE
332
2
0
11 Jul 2025
The Importance of Being Lazy: Scaling Limits of Continual Learning
The Importance of Being Lazy: Scaling Limits of Continual Learning
Jacopo Graldi
Alessandro Breccia
Giulia Lanzillotta
Thomas Hofmann
Lorenzo Noci
CLL
381
3
0
20 Jun 2025
Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size
Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size
Soufiane Hayou
Liyuan Liu
197
3
0
17 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
360
34
0
09 Jun 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
513
8
0
08 Jun 2025
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Yuanzhe Hu
Kinshuk Goel
Vlad Killiakov
Yaoqing Yang
512
5
0
06 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
549
7
0
06 Jun 2025
Horizon Reduction Makes RL Scalable
Horizon Reduction Makes RL Scalable
Seohong Park
Kevin Frans
Deepinder Mann
Benjamin Eysenbach
Aviral Kumar
Sergey Levine
OffRL
723
24
0
04 Jun 2025
Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics
Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics
Shiwei Li
Xiandi Luo
Xing Tang
Haozhao Wang
Hao Chen
Weihong Luo
Yuhua Li
Xiuqiang He
Ruixuan Li
AI4CE
313
12
0
29 May 2025
Variational Deep Learning via Implicit Regularization
Variational Deep Learning via Implicit Regularization
Jonathan Wenger
Beau Coker
Juraj Marusic
John P. Cunningham
OODUQCVFedML
362
1
0
26 May 2025
Small-to-Large Generalization: Data Influences Models Consistently Across Scale
Small-to-Large Generalization: Data Influences Models Consistently Across Scale
Alaa Khaddaj
Logan Engstrom
Aleksander Madry
TDIAI4CE
372
1
0
22 May 2025
Short-Range Dependency Effects on Transformer Instability and a Decomposed Attention Solution
Short-Range Dependency Effects on Transformer Instability and a Decomposed Attention Solution
Suvadeep Hajra
323
1
0
21 May 2025
The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models
The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models
Adrian Cosma
Stefan Ruseti
Emilian Radoi
Mihai Dascalu
LRM
468
9
0
20 May 2025
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Yun Wang
Z. Fu
Jie Cai
Peijun Tang
Hongya Lyu
...
Jie Zhou
Guoyang Zeng
Chaojun Xiao
Xu Han
Zhiyuan Liu
470
23
0
08 May 2025
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
Adrian Cosma
Andy Catruna
Emilian Radoi
457
1
0
10 Apr 2025
Prot42: a Novel Family of Protein Language Models for Target-aware Protein Binder Generation
Prot42: a Novel Family of Protein Language Models for Target-aware Protein Binder Generation
Mohammad Amaan Sayeed
Engin Tekin
Maryam Nadeem
Nancy A. ElNaker
A. Singh
Natalia Vassilieva
Boulbaba Ben Amor
392
3
0
06 Apr 2025
Chem42: a Family of chemical Language Models for Target-aware Ligand Generation
Chem42: a Family of chemical Language Models for Target-aware Ligand Generation
A. Singh
Engin Tekin
Maryam Nadeem
Nancy A. ElNaker
Mohammad Amaan Sayeed
Natalia Vassilieva
Boulbaba Ben Amor
452
1
0
20 Mar 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent
Kyle Hsu
Justin Johnson
L. Fei-Fei
Jiajun Wu
DiffMMU
645
34
0
14 Mar 2025
Learning richness modulates equality reasoning in neural networks
Learning richness modulates equality reasoning in neural networks
William L. Tong
Cengiz Pehlevan
448
0
0
12 Mar 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
435
14
0
26 Feb 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
538
15
0
26 Feb 2025
(Mis)Fitting: A Survey of Scaling Laws
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
479
14
0
26 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
349
17
0
24 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
797
7
0
24 Feb 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Towards Precise Scaling Laws for Video Diffusion TransformersComputer Vision and Pattern Recognition (CVPR), 2024
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
559
13
0
03 Jan 2025
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMsInternational Conference on Learning Representations (ICLR), 2024
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
476
37
0
17 Dec 2024
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
Model Fusion through Bayesian Optimization in Language Model Fine-TuningNeural Information Processing Systems (NeurIPS), 2024
Chaeyun Jang
Hyungi Lee
Jungtaek Kim
Juho Lee
MoMe
515
3
0
11 Nov 2024
Scaling Laws for Precision
Scaling Laws for PrecisionInternational Conference on Learning Representations (ICLR), 2024
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
Cengiz Pehlevan
Christopher Ré
Aditi Raghunathan
AIFinMoMe
494
74
0
07 Nov 2024
Crystal: Illuminating LLM Abilities on Language and Code
Crystal: Illuminating LLM Abilities on Language and Code
Tianhua Tao
Junbo Li
Bowen Tan
Hongyi Wang
William Marshall
...
Joel Hestness
Natalia Vassilieva
Zhiqiang Shen
Eric P. Xing
Zhengzhong Liu
239
8
0
06 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
Jiansheng Wei
Zhiyuan Liu
Maosong Sun
677
19
0
04 Nov 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?International Conference on Learning Representations (ICLR), 2024
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
766
49
0
29 Oct 2024
123
Next
Page 1 of 3