ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.03977
  4. Cited By
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate
  Schedule
v1v2v3v4v5 (latest)

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Journal of machine learning research (JMLR), 2020
9 March 2020
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
ArXiv (abs)PDFHTML

Papers citing "Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule"

21 / 21 papers shown
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Jiuan Zhou
Yu Cheng
Yuan Xie
Z. Yin
163
9
0
08 Oct 2025
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Doğay Altınel
195
1
0
22 Sep 2025
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Aleksandr Dremov
Alexander Hägele
Atli Kosson
Martin Jaggi
287
6
0
02 Aug 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
282
0
0
25 Feb 2025
Where Do Large Learning Rates Lead Us?
Where Do Large Learning Rates Lead Us?Neural Information Processing Systems (NeurIPS), 2024
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
375
6
0
29 Oct 2024
A Learning Rate Path Switching Training Paradigm for Version Updates of
  Large Language Models
A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zhihao Wang
Shiyu Liu
Jianheng Huang
Zheng Wang
Yixuan Liao
Xiaoxin Chen
Junfeng Yao
Jinsong Su
266
2
0
05 Oct 2024
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge
  Distillation
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Yichong Leng
Xu Tan
Bihui Yu
Ruifeng Guo
255
12
0
23 Apr 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated
  Vehicle Routing Problem
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
209
1
0
28 Feb 2024
Unraveling Key Factors of Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Xu Tan
Bihui Yu
Ruifeng Guo
230
0
0
14 Dec 2023
Large Learning Rates Improve Generalization: But How Large Are We
  Talking About?
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
249
1
0
19 Nov 2023
No Data Augmentation? Alternative Regularizations for Effective Training
  on Small Datasets
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
Stavroula Mougiakakou
257
5
0
04 Sep 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
507
59
0
12 Jul 2023
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer ModelsIEEE International Joint Conference on Neural Network (IJCNN), 2022
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
203
13
0
20 Sep 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in
  Three Regimes
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three RegimesNeural Information Processing Systems (NeurIPS), 2022
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
367
19
0
08 Sep 2022
Distance Learner: Incorporating Manifold Prior to Model Training
Distance Learner: Incorporating Manifold Prior to Model Training
Aditya Chetan
Nipun Kwatra
99
1
0
14 Jul 2022
Efficient Multi-Purpose Cross-Attention Based Image Alignment Block for
  Edge Devices
Efficient Multi-Purpose Cross-Attention Based Image Alignment Block for Edge Devices
Bahri Batuhan Bilecen
Alparslan Fisne
Mustafa Ayazoglu
239
2
0
01 Jun 2022
IMDeception: Grouped Information Distilling Super-Resolution Network
IMDeception: Grouped Information Distilling Super-Resolution Network
Mustafa Ayazoglu
214
6
0
25 Apr 2022
Ranger21: a synergistic deep learning optimizer
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODLAI4CE
295
116
0
25 Jun 2021
LRTuner: A Learning Rate Tuner for Deep Neural Networks
LRTuner: A Learning Rate Tuner for Deep Neural Networks
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
ODL
170
2
0
30 May 2021
Understanding Decoupled and Early Weight Decay
Understanding Decoupled and Early Weight DecayAAAI Conference on Artificial Intelligence (AAAI), 2020
Johan Bjorck
Kilian Q. Weinberger
Daniel Schwalbe-Koda
180
36
0
27 Dec 2020
BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint
  Attention
BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention
Zhebin Zhang
Sai Wu
Dawei Jiang
Gang Chen
233
2
0
09 Nov 2020
1
Page 1 of 1