ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03265
  4. Cited By
On the Variance of the Adaptive Learning Rate and Beyond
v1v2v3v4 (latest)

On the Variance of the Adaptive Learning Rate and Beyond

International Conference on Learning Representations (ICLR), 2019
8 August 2019
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
    ODL
ArXiv (abs)PDFHTMLGithub (2548★)

Papers citing "On the Variance of the Adaptive Learning Rate and Beyond"

50 / 915 papers shown
Controlling changes to attention logits
Controlling changes to attention logits
Ben Anson
Laurence Aitchison
212
0
0
26 Nov 2025
HVAdam: A Full-Dimension Adaptive Optimizer
HVAdam: A Full-Dimension Adaptive OptimizerAAAI Conference on Artificial Intelligence (AAAI), 2025
Yiheng Zhang
Shaowu Wu
Yuanzhuo Xu
Jiajun Wu
Shang Xu
Steve Drew
Xiaoguang Niu
230
0
0
25 Nov 2025
GLOBE: Accurate and Generalizable PDE Surrogates using Domain-Inspired Architectures and Equivariances
GLOBE: Accurate and Generalizable PDE Surrogates using Domain-Inspired Architectures and Equivariances
Peter Sharpe
AI4CE
223
0
0
19 Nov 2025
Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks
Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks
Guillaume Infantes
Stéphanie Roussel
Antoine Jacquet
Vincent Baudoui
101
0
0
17 Nov 2025
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
Meng Zhu
Quan Xiao
Weidong Min
310
0
0
17 Nov 2025
From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression
From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression
Chaoyi Lin
Yaojun Wu
Yue Li
Junru Li
Kai Zhang
Li Zhang
218
0
0
11 Nov 2025
QuAnTS: Question Answering on Time Series
QuAnTS: Question Answering on Time Series
Felix Divo
Maurice Kraus
Anh Q. Nguyen
Hao Xue
Imran Razzak
Flora D. Salim
Kristian Kersting
Devendra Singh Dhami
138
1
0
07 Nov 2025
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification
Zijiang Yang
Hanqing Chao
Bokai Zhao
Yelin Yang
Yunshuo Zhang
...
K. Yan
Dakai Jin
Minfeng Xu
Yun Bian
Hui Jiang
349
2
0
07 Nov 2025
The Neural Differential Manifold: An Architecture with Explicit Geometric Structure
The Neural Differential Manifold: An Architecture with Explicit Geometric Structure
Di Zhang
125
1
0
29 Oct 2025
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
Keisuke Imoto
102
0
0
29 Oct 2025
Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
Zhifeng Wang
Longlong Li
Chunyan Zeng
136
0
0
29 Oct 2025
Poisson Flow Consistency Training
Poisson Flow Consistency Training
Anthony Zhang
Mahmut S. Gokmen
Dennis Hein
Rongjun Ge
Wenjun Xia
Ge Wang
Jin Chen
OOD
176
0
0
23 Oct 2025
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
In-Hwan Jin
Hyeongju Mun
Joonsoo Kim
Kugjin Yun
Kyeongbo Kong
3DGSMoE
232
0
0
22 Oct 2025
Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Ryo Masumura
Shota Orihashi
Mana Ihori
Tomohiro Tanaka
Naoki Makishima
Taiga Yamane
Naotaka Kawata
Satoshi Suzuki
Taichi Katayama
122
0
0
16 Oct 2025
Generating healthy counterfactuals with denoising diffusion bridge models
Generating healthy counterfactuals with denoising diffusion bridge models
Ana Lawry Aguila
Peirong Liu
Marina Crespo Aguirre
J. Iglesias
DiffMMedIm
137
0
0
15 Oct 2025
PruneGCRN: Minimizing and explaining spatio-temporal problems through node pruning
PruneGCRN: Minimizing and explaining spatio-temporal problems through node pruning
Javier García-Sigüenza
Mirco Nanni
Faraón Llorens-Largo
José F. Vicent
136
0
0
12 Oct 2025
Stability of Transformers under Layer Normalization
Stability of Transformers under Layer Normalization
Kelvin Kan
Xingjian Li
Benjamin J. Zhang
Tuhin Sahai
Stanley Osher
Krishna Kumar
Markos A. Katsoulakis
168
3
0
10 Oct 2025
MAT-Agent: Adaptive Multi-Agent Training Optimization
MAT-Agent: Adaptive Multi-Agent Training Optimization
Jusheng Zhang
Kaitong Cai
Yijia Fan
Ningyuan Liu
Keze Wang
208
39
0
10 Oct 2025
Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics
Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics
Luca Wolf
Tobias Buck
Bjoern Malte Schaefer
163
0
0
07 Oct 2025
Explore the Loss space with Hill-ADAM
Explore the Loss space with Hill-ADAM
Meenakshi Manikandan
Leilani Gilpin
ODL
237
0
0
04 Oct 2025
Topological Invariance and Breakdown in Learning
Topological Invariance and Breakdown in Learning
Yongyi Yang
Tomaso Poggio
Isaac Chuang
Liu Ziyin
150
0
0
03 Oct 2025
Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents
Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents
Beomsu Kim
Byunghee Cha
Jong Chul Ye
181
0
0
01 Oct 2025
Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global Context
Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global ContextInformation Sciences (Inf. Sci.), 2025
Y. X. R. Wang
Weigang Li
Wenping Liu
Zhe Xu
Zhiqiang Tian
3DPC
182
2
0
29 Sep 2025
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
Zheyuan Hu
Chieh-Hsin Lai
Yuki Mitsufuji
Stefano Ermon
167
10
0
29 Sep 2025
A regret minimization approach to fixed-point iterations
A regret minimization approach to fixed-point iterations
Joon Kwon
167
0
0
25 Sep 2025
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
Xinyu Lian
Masahiro Tanaka
Olatunji Ruwase
Minjia Zhang
152
3
0
25 Sep 2025
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Doğay Altınel
165
1
0
22 Sep 2025
CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner
CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot MannerInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yao Du
Jiarong Guo
Xiaomeng Li
223
0
0
21 Sep 2025
On the Convergence of Muon and Beyond
On the Convergence of Muon and Beyond
Da Chang
Yongxiang Liu
Ganzhao Yuan
418
6
0
19 Sep 2025
From Next Token Prediction to (STRIPS) World Models
From Next Token Prediction to (STRIPS) World Models
Carlos Núñez-Molina
Vicenç Gómez
Héctor Geffner
227
0
0
16 Sep 2025
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
Marco Pasini
Stefan Lattner
George Fazekas
164
2
0
11 Sep 2025
Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence
Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence
Yuxing Liu
Yuze Ge
Rui Pan
An Kang
Tong Zhang
AI4CE
248
3
0
09 Sep 2025
Sem-RaDiff: Diffusion-Based 3D Radar Semantic Perception in Cluttered Agricultural Environments
Sem-RaDiff: Diffusion-Based 3D Radar Semantic Perception in Cluttered Agricultural Environments
Ruibin Zhang
Fei Gao
232
1
0
02 Sep 2025
StoxLSTM: A Stochastic Extended Long Short-Term Memory Network for Time Series Forecasting
StoxLSTM: A Stochastic Extended Long Short-Term Memory Network for Time Series Forecasting
Zihao Wang
Yunjie Li
Lingmin Zan
Zheng Gong
Mengtao Zhu
AI4TSBDL
224
1
0
01 Sep 2025
Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent
Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent
Yixin Gao
Xin Li
Xiaohan Pan
Runsen Feng
Bingchen Li
Y. Qi
Y. Lu
Zhengxue Cheng
Zhibo Chen
Jörn Ostermann
173
0
0
21 Aug 2025
HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation
HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation
Gaston Gustavo Rios
P. D. Bianco
Franco Ronchetti
F. Quiroga
Oscar Stanchi
Santiago Ponte Ahón
Waldo Hasperué
SLR
330
1
0
20 Aug 2025
MuFlex: A Scalable, Physics-based Platform for Multi-Building Flexibility Analysis and Coordination
MuFlex: A Scalable, Physics-based Platform for Multi-Building Flexibility Analysis and Coordination
Ziyan Wu
Ivan Korolija
Rui Tang
AI4CE
220
0
0
19 Aug 2025
GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks
GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks
Sergey Salishev
Ian Akhremchik
MQ
399
1
0
19 Aug 2025
MASIV: Toward Material-Agnostic System Identification from Videos
MASIV: Toward Material-Agnostic System Identification from Videos
Yizhou Zhao
Haoyu Chen
Chunjiang Liu
Zhenyang Li
Charles Herrmann
Junhwa Hur
Yinxiao Li
Ming-Hsuan Yang
Bhiksha Raj
Min Xu
PINN
224
3
0
01 Aug 2025
AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock
AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock
Umair Nawaz
Muhammad Zaigham Zaheer
Fahad Shahbaz Khan
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
153
3
0
29 Jul 2025
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
YuXin Li
Felix Dangel
Derek Tam
Colin Raffel
311
6
0
24 Jul 2025
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference
Amirarsalan Moatazedian
Yauhen Yakimenka
Rémi A. Chou
J. Kliewer
110
0
0
23 Jul 2025
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation
Yuxuan He
Xiaoran Yang
Ningning Pan
Gongping Huang
235
0
0
22 Jul 2025
Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
Go Nishikawa
Wataru Nakata
Yuki Saito
Kanami Imamura
Hiroshi Saruwatari
Tomohiko Nakamura
192
1
0
19 Jul 2025
Feature-Enhanced TResNet for Fine-Grained Food Image Classification
Feature-Enhanced TResNet for Fine-Grained Food Image Classification
Lulu Liu
Zhiyong Xiao
241
1
0
17 Jul 2025
Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation
Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation
Mohammad Rostami
Dayuan Jian
Ruitong Sun
357
1
0
01 Jul 2025
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines
Camilo Luciano Fosco
Emilie Josephs
A. Andonian
Allen Lee
571
4
0
01 Jul 2025
ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors
ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors
Junghyun Koo
Marco A. Martínez-Ramírez
Wei-Hsiang Liao
Giorgio Fabbro
Michele Mancusi
Yuki Mitsufuji
310
1
0
20 Jun 2025
Rethinking Losses for Diffusion Bridge Samplers
Rethinking Losses for Diffusion Bridge Samplers
Sebastian Sanokowski
Lukas Gruber
Christoph Bartmann
Sepp Hochreiter
Sebastian Lehner
DiffM
421
6
0
12 Jun 2025
An Adaptive Method Stabilizing Activations for Enhanced Generalization
Hyunseok Seung
Jaewoo Lee
Hyunsuk Ko
ODL
326
0
0
10 Jun 2025
1234...171819
Next
Page 1 of 19
Pageof 19