v1v2 (latest)

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

7 March 2022

Xiaodong Liu

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (1523★)

Papers citing "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer"

50 / 150 papers shown

Auxiliary-Hyperparameter-Free Sampling: Entropy Equilibrium for Text Generation

114

30 Nov 2025

Controlling changes to attention logits

Ben Anson

Laurence Aitchison

225

26 Nov 2025

Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

420

23 Nov 2025

Deep Progressive Training: scaling up depth capacity of zero/one-layer models

Zhiqi Bu

AI4CE

164

07 Nov 2025

A Proof of Learning Rate Transfer under

μ

Soufiane Hayou

MLT

190

03 Nov 2025

Quantitative Bounds for Length Generalization in Transformers

Zachary Izzo

Eshaan Nichani

Jason D. Lee

298

30 Oct 2025

Zero-Shot Performance Prediction for Probabilistic Scaling Laws

170

19 Oct 2025

Robust Layerwise Scaling Rules by Proper Weight Decay Tuning

141

17 Oct 2025

Spectral Alignment as Predictor of Loss Explosion in Neural Network Training

142

05 Oct 2025

Arithmetic-Mean

μ

P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets

275

05 Oct 2025

Optimal Scaling Needs Optimal Norm

238

04 Oct 2025

Muon: Training and Trade-offs with Latent Attention and MoE

160

29 Sep 2025

Train Once, Answer All: Many Pretraining Experiments for the Cost of One

Sebastian Bordt

Martin Pawelczyk

CLL

259

27 Sep 2025

Pre-training under infinite compute

306

18 Sep 2025

Deep Learning-Driven Peptide Classification in Biological Nanopores

133

17 Sep 2025

LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization

Yunfei Teng

Sixin Zhang

204

03 Sep 2025

FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics

...

133

13 Aug 2025

Geometry of Neural Reinforcement Learning in Continuous State and Action SpacesInternational Conference on Learning Representations (ICLR), 2025

Saket Tiwari

Omer Gottesman

George Konidaris

377

28 Jul 2025

What Can Grokking Teach Us About Learning Under Nonstationarity?

236

26 Jul 2025

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

332

11 Jul 2025

The Importance of Being Lazy: Scaling Limits of Continual Learning

381

20 Jun 2025

Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size

Soufiane Hayou

Liyuan Liu

197

17 Jun 2025

MiniCPM4: Ultra-Efficient LLMs on End Devices

...

360

09 Jun 2025

A Stable Whitening Optimizer for Efficient Neural Network Training

Kevin Frans

Sergey Levine

Pieter Abbeel

513

08 Jun 2025

Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

512

06 Jun 2025

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

D. Kunin

Giovanni Luca Marchetti

549

06 Jun 2025

Horizon Reduction Makes RL Scalable

723

04 Jun 2025

Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics

313

29 May 2025

Variational Deep Learning via Implicit Regularization

362

26 May 2025

Small-to-Large Generalization: Data Influences Models Consistently Across Scale

372

22 May 2025

Short-Range Dependency Effects on Transformer Instability and a Decomposed Attention Solution

Suvadeep Hajra

323

21 May 2025

The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models

468

20 May 2025

Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data

...

470

08 May 2025

On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition

Adrian Cosma

Andy Catruna

Emilian Radoi

457

10 Apr 2025

Prot42: a Novel Family of Protein Language Models for Target-aware Protein Binder Generation

Mohammad Amaan Sayeed

392

06 Apr 2025

Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

Mohammad Amaan Sayeed

Natalia Vassilieva

Boulbaba Ben Amor

452

20 Mar 2025

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

645

14 Mar 2025

Learning richness modulates equality reasoning in neural networks

William L. Tong

Cengiz Pehlevan

448

12 Mar 2025

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

435

26 Feb 2025

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

538

26 Feb 2025

(Mis)Fitting: A Survey of Scaling Laws

Margaret Li

Sneha Kudugunta

Luke Zettlemoyer

479

26 Feb 2025

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Florent Bartoccioni

Elias Ramzi

Victor Besnier

Shashanka Venkataramanan

...

349

24 Feb 2025

Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective

797

24 Feb 2025

Towards Precise Scaling Laws for Video Diffusion TransformersComputer Vision and Pattern Recognition (CVPR), 2024

...

559

03 Jan 2025

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMsInternational Conference on Learning Representations (ICLR), 2024

Aldo Pareja

Nikhil Shivakumar Nayak

Hao Wang

Krishnateja Killamsetty

Shivchander Sudalairaj

...

476

17 Dec 2024

Model Fusion through Bayesian Optimization in Language Model Fine-TuningNeural Information Processing Systems (NeurIPS), 2024

515

11 Nov 2024

Scaling Laws for PrecisionInternational Conference on Learning Representations (ICLR), 2024

494

07 Nov 2024

Crystal: Illuminating LLM Abilities on Language and Code

...

239

06 Nov 2024

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

677

04 Nov 2024

How Does Critical Batch Size Scale in Pre-training?International Conference on Learning Representations (ICLR), 2024

766

29 Oct 2024