v1v2v3v4 (latest)

On the Variance of the Adaptive Learning Rate and Beyond

International Conference on Learning Representations (ICLR), 2019

8 August 2019

Xiaodong Liu

ArXiv (abs)PDF HTML Github (2548★)

Papers citing "On the Variance of the Adaptive Learning Rate and Beyond"

50 / 915 papers shown

Controlling changes to attention logits

Ben Anson

Laurence Aitchison

212

26 Nov 2025

HVAdam: A Full-Dimension Adaptive OptimizerAAAI Conference on Artificial Intelligence (AAAI), 2025

230

25 Nov 2025

GLOBE: Accurate and Generalizable PDE Surrogates using Domain-Inspired Architectures and Equivariances

Peter Sharpe

AI4CE

223

19 Nov 2025

Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks

101

17 Nov 2025

AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate

Meng Zhu

Quan Xiao

Weidong Min

310

17 Nov 2025

From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression

218

11 Nov 2025

QuAnTS: Question Answering on Time Series

138

07 Nov 2025

MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification

...

349

07 Nov 2025

The Neural Differential Manifold: An Architecture with Explicit Geometric Structure

Di Zhang

125

29 Oct 2025

Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels

Keisuke Imoto

102

29 Oct 2025

Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training

Zhifeng Wang

Longlong Li

Chunyan Zeng

136

29 Oct 2025

Poisson Flow Consistency Training

176

23 Oct 2025

MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting

232

22 Oct 2025

Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition

122

16 Oct 2025

Generating healthy counterfactuals with denoising diffusion bridge models

Ana Lawry Aguila

Peirong Liu

Marina Crespo Aguirre

J. Iglesias

DiffM MedIm

137

15 Oct 2025

PruneGCRN: Minimizing and explaining spatio-temporal problems through node pruning

Javier García-Sigüenza

Mirco Nanni

Faraón Llorens-Largo

José F. Vicent

136

12 Oct 2025

Stability of Transformers under Layer Normalization

Markos A. Katsoulakis

168

10 Oct 2025

MAT-Agent: Adaptive Multi-Agent Training Optimization

208

10 Oct 2025

Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics

Luca Wolf

Tobias Buck

Bjoern Malte Schaefer

163

07 Oct 2025

Explore the Loss space with Hill-ADAM

Meenakshi Manikandan

Leilani Gilpin

ODL

237

04 Oct 2025

Topological Invariance and Breakdown in Learning

150

03 Oct 2025

Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents

Beomsu Kim

Byunghee Cha

Jong Chul Ye

181

01 Oct 2025

Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global ContextInformation Sciences (Inf. Sci.), 2025

182

29 Sep 2025

CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

167

29 Sep 2025

A regret minimization approach to fixed-point iterations

Joon Kwon

167

25 Sep 2025

SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips

152

25 Sep 2025

Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules

Doğay Altınel

165

22 Sep 2025

CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot MannerInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

Yao Du

Jiarong Guo

Xiaomeng Li

223

21 Sep 2025

On the Convergence of Muon and Beyond

Da Chang

Yongxiang Liu

Ganzhao Yuan

418

19 Sep 2025

From Next Token Prediction to (STRIPS) World Models

Carlos Núñez-Molina

Vicenç Gómez

Héctor Geffner

227

16 Sep 2025

CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio

Marco Pasini

Stefan Lattner

George Fazekas

164

11 Sep 2025

Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence

248

09 Sep 2025

Sem-RaDiff: Diffusion-Based 3D Radar Semantic Perception in Cluttered Agricultural Environments

Ruibin Zhang

Fei Gao

232

02 Sep 2025

StoxLSTM: A Stochastic Extended Long Short-Term Memory Network for Time Series Forecasting

224

01 Sep 2025

Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent

173

21 Aug 2025

HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation

330

20 Aug 2025

MuFlex: A Scalable, Physics-based Platform for Multi-Building Flexibility Analysis and Coordination

220

19 Aug 2025

GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks

Sergey Salishev

Ian Akhremchik

399

19 Aug 2025

MASIV: Toward Material-Agnostic System Identification from Videos

224

01 Aug 2025

AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock

Umair Nawaz

Muhammad Zaigham Zaheer

153

29 Jul 2025

Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator

311

24 Jul 2025

Minimax Data Sanitization with Distortion Constraint and Adversarial Inference

Amirarsalan Moatazedian

Yauhen Yakimenka

Rémi A. Chou

J. Kliewer

110

23 Jul 2025

TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

235

22 Jul 2025

Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer

192

19 Jul 2025

Feature-Enhanced TResNet for Fine-Grained Food Image Classification

Lulu Liu

Zhiyong Xiao

241

17 Jul 2025

Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation

Mohammad Rostami

Dayuan Jian

Ruitong Sun

357

01 Jul 2025

Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines

571

01 Jul 2025

ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors

Junghyun Koo

Marco A. Martínez-Ramírez

310

20 Jun 2025

Rethinking Losses for Diffusion Bridge Samplers

421

12 Jun 2025

An Adaptive Method Stabilizing Activations for Enhanced Generalization

326

10 Jun 2025