ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03265
  4. Cited By
On the Variance of the Adaptive Learning Rate and Beyond
v1v2v3v4 (latest)

On the Variance of the Adaptive Learning Rate and Beyond

International Conference on Learning Representations (ICLR), 2019
8 August 2019
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
    ODL
ArXiv (abs)PDFHTMLGithub (2548★)

Papers citing "On the Variance of the Adaptive Learning Rate and Beyond"

50 / 915 papers shown
Controlling changes to attention logits
Controlling changes to attention logits
Ben Anson
Laurence Aitchison
194
0
0
26 Nov 2025
HVAdam: A Full-Dimension Adaptive Optimizer
HVAdam: A Full-Dimension Adaptive OptimizerAAAI Conference on Artificial Intelligence (AAAI), 2025
Yiheng Zhang
Shaowu Wu
Yuanzhuo Xu
Jiajun Wu
Shang Xu
Steve Drew
Xiaoguang Niu
217
0
0
25 Nov 2025
GLOBE: Accurate and Generalizable PDE Surrogates using Domain-Inspired Architectures and Equivariances
GLOBE: Accurate and Generalizable PDE Surrogates using Domain-Inspired Architectures and Equivariances
Peter Sharpe
AI4CE
191
0
0
19 Nov 2025
Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks
Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks
Guillaume Infantes
Stéphanie Roussel
Antoine Jacquet
Vincent Baudoui
81
0
0
17 Nov 2025
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
Meng Zhu
Quan Xiao
Weidong Min
289
0
0
17 Nov 2025
From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression
From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression
Chaoyi Lin
Yaojun Wu
Yue Li
Junru Li
Kai Zhang
Li Zhang
207
0
0
11 Nov 2025
QuAnTS: Question Answering on Time Series
QuAnTS: Question Answering on Time Series
Felix Divo
Maurice Kraus
Anh Q. Nguyen
Hao Xue
Imran Razzak
Flora D. Salim
Kristian Kersting
Devendra Singh Dhami
131
1
0
07 Nov 2025
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification
Zijiang Yang
Hanqing Chao
Bokai Zhao
Yelin Yang
Yunshuo Zhang
...
K. Yan
Dakai Jin
Minfeng Xu
Yun Bian
Hui Jiang
320
0
0
07 Nov 2025
The Neural Differential Manifold: An Architecture with Explicit Geometric Structure
The Neural Differential Manifold: An Architecture with Explicit Geometric Structure
Di Zhang
109
1
0
29 Oct 2025
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels
Keisuke Imoto
86
0
0
29 Oct 2025
Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
Zhifeng Wang
Longlong Li
Chunyan Zeng
119
0
0
29 Oct 2025
Poisson Flow Consistency Training
Poisson Flow Consistency Training
Anthony Zhang
Mahmut S. Gokmen
Dennis Hein
Rongjun Ge
Wenjun Xia
Ge Wang
Jin Chen
OOD
161
0
0
23 Oct 2025
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
In-Hwan Jin
Hyeongju Mun
Joonsoo Kim
Kugjin Yun
Kyeongbo Kong
3DGSMoE
208
0
0
22 Oct 2025
Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Ryo Masumura
Shota Orihashi
Mana Ihori
Tomohiro Tanaka
Naoki Makishima
Taiga Yamane
Naotaka Kawata
Satoshi Suzuki
Taichi Katayama
102
0
0
16 Oct 2025
Generating healthy counterfactuals with denoising diffusion bridge models
Generating healthy counterfactuals with denoising diffusion bridge models
Ana Lawry Aguila
Peirong Liu
Marina Crespo Aguirre
J. Iglesias
DiffMMedIm
112
0
0
15 Oct 2025
PruneGCRN: Minimizing and explaining spatio-temporal problems through node pruning
PruneGCRN: Minimizing and explaining spatio-temporal problems through node pruning
Javier García-Sigüenza
Mirco Nanni
Faraón Llorens-Largo
José F. Vicent
122
0
0
12 Oct 2025
Stability of Transformers under Layer Normalization
Stability of Transformers under Layer Normalization
Kelvin Kan
Xingjian Li
Benjamin J. Zhang
Tuhin Sahai
Stanley Osher
Krishna Kumar
Markos A. Katsoulakis
141
1
0
10 Oct 2025
MAT-Agent: Adaptive Multi-Agent Training Optimization
MAT-Agent: Adaptive Multi-Agent Training Optimization
Jusheng Zhang
Kaitong Cai
Yijia Fan
Ningyuan Liu
Keze Wang
176
37
0
10 Oct 2025
Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics
Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics
Luca Wolf
Tobias Buck
Bjoern Malte Schaefer
142
0
0
07 Oct 2025
Explore the Loss space with Hill-ADAM
Explore the Loss space with Hill-ADAM
Meenakshi Manikandan
Leilani Gilpin
ODL
221
0
0
04 Oct 2025
Topological Invariance and Breakdown in Learning
Topological Invariance and Breakdown in Learning
Yongyi Yang
Tomaso Poggio
Isaac Chuang
Liu Ziyin
138
0
0
03 Oct 2025
Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents
Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents
Beomsu Kim
Byunghee Cha
Jong Chul Ye
167
0
0
01 Oct 2025
Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global Context
Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global ContextInformation Sciences (Inf. Sci.), 2025
Y. X. R. Wang
Weigang Li
Wenping Liu
Zhe Xu
Zhiqiang Tian
3DPC
130
2
0
29 Sep 2025
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
Zheyuan Hu
Chieh-Hsin Lai
Yuki Mitsufuji
Stefano Ermon
133
7
0
29 Sep 2025
A regret minimization approach to fixed-point iterations
A regret minimization approach to fixed-point iterations
Joon Kwon
151
0
0
25 Sep 2025
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
Xinyu Lian
Masahiro Tanaka
Olatunji Ruwase
Minjia Zhang
131
3
0
25 Sep 2025
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Doğay Altınel
145
1
0
22 Sep 2025
CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner
CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot MannerInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yao Du
Jiarong Guo
Xiaomeng Li
194
0
0
21 Sep 2025
On the Convergence of Muon and Beyond
On the Convergence of Muon and Beyond
Da Chang
Yongxiang Liu
Ganzhao Yuan
380
5
0
19 Sep 2025
From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
Carlos Núñez-Molina
Vicenç Gómez
Héctor Geffner
198
0
0
16 Sep 2025
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
Marco Pasini
Stefan Lattner
George Fazekas
152
2
0
11 Sep 2025
Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence
Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence
Yuxing Liu
Yuze Ge
Rui Pan
An Kang
Tong Zhang
AI4CE
203
3
0
09 Sep 2025
Sem-RaDiff: Diffusion-Based 3D Radar Semantic Perception in Cluttered Agricultural Environments
Sem-RaDiff: Diffusion-Based 3D Radar Semantic Perception in Cluttered Agricultural Environments
Ruibin Zhang
Fei Gao
209
1
0
02 Sep 2025
StoxLSTM: A Stochastic Extended Long Short-Term Memory Network for Time Series Forecasting
StoxLSTM: A Stochastic Extended Long Short-Term Memory Network for Time Series Forecasting
Zihao Wang
Yunjie Li
Lingmin Zan
Zheng Gong
Mengtao Zhu
AI4TSBDL
207
1
0
01 Sep 2025
Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent
Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent
Yixin Gao
Xin Li
Xiaohan Pan
Runsen Feng
Bingchen Li
Y. Qi
Y. Lu
Zhengxue Cheng
Zhibo Chen
Jörn Ostermann
155
0
0
21 Aug 2025
HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation
HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation
Gaston Gustavo Rios
P. D. Bianco
Franco Ronchetti
F. Quiroga
Oscar Stanchi
Santiago Ponte Ahón
Waldo Hasperué
SLR
309
1
0
20 Aug 2025
MuFlex: A Scalable, Physics-based Platform for Multi-Building Flexibility Analysis and Coordination
MuFlex: A Scalable, Physics-based Platform for Multi-Building Flexibility Analysis and Coordination
Ziyan Wu
Ivan Korolija
Rui Tang
AI4CE
182
0
0
19 Aug 2025
GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks
GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks
Sergey Salishev
Ian Akhremchik
MQ
386
1
0
19 Aug 2025
MASIV: Toward Material-Agnostic System Identification from Videos
MASIV: Toward Material-Agnostic System Identification from Videos
Yizhou Zhao
Haoyu Chen
Chunjiang Liu
Zhenyang Li
Charles Herrmann
Junhwa Hur
Yinxiao Li
Ming-Hsuan Yang
Bhiksha Raj
Min Xu
PINN
196
1
0
01 Aug 2025
AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock
AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock
Umair Nawaz
Muhammad Zaigham Zaheer
Fahad Shahbaz Khan
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
144
1
0
29 Jul 2025
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
YuXin Li
Felix Dangel
Derek Tam
Colin Raffel
254
5
0
24 Jul 2025
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference
Amirarsalan Moatazedian
Yauhen Yakimenka
Rémi A. Chou
J. Kliewer
93
0
0
23 Jul 2025
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation
Yuxuan He
Xiaoran Yang
Ningning Pan
Gongping Huang
197
0
0
22 Jul 2025
Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
Go Nishikawa
Wataru Nakata
Yuki Saito
Kanami Imamura
Hiroshi Saruwatari
Tomohiko Nakamura
178
0
0
19 Jul 2025
Feature-Enhanced TResNet for Fine-Grained Food Image Classification
Feature-Enhanced TResNet for Fine-Grained Food Image Classification
Lulu Liu
Zhiyong Xiao
219
1
0
17 Jul 2025
Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation
Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation
Mohammad Rostami
Dayuan Jian
Ruitong Sun
341
1
0
01 Jul 2025
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines
Camilo Luciano Fosco
Emilie Josephs
A. Andonian
Allen Lee
540
4
0
01 Jul 2025
ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors
ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors
Junghyun Koo
Marco A. Martínez-Ramírez
Wei-Hsiang Liao
Giorgio Fabbro
Michele Mancusi
Yuki Mitsufuji
275
1
0
20 Jun 2025
Rethinking Losses for Diffusion Bridge Samplers
Rethinking Losses for Diffusion Bridge Samplers
Sebastian Sanokowski
Lukas Gruber
Christoph Bartmann
Sepp Hochreiter
Sebastian Lehner
DiffM
391
4
0
12 Jun 2025
An Adaptive Method Stabilizing Activations for Enhanced Generalization
Hyunseok Seung
Jaewoo Lee
Hyunsuk Ko
ODL
300
0
0
10 Jun 2025
1234...171819
Next
Page 1 of 19
Pageof 19