ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.03977
  4. Cited By
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate
  Schedule
v1v2v3v4v5 (latest)

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

9 March 2020
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
ArXiv (abs)PDFHTML

Papers citing "Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule"

18 / 18 papers shown
Title
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
85
0
0
25 Feb 2025
Where Do Large Learning Rates Lead Us?
Where Do Large Learning Rates Lead Us?
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
90
1
0
29 Oct 2024
A Learning Rate Path Switching Training Paradigm for Version Updates of
  Large Language Models
A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models
Zhihao Wang
Shiyu Liu
Jianheng Huang
Zheng Wang
Yixuan Liao
Xiaoxin Chen
Junfeng Yao
Jinsong Su
82
1
0
05 Oct 2024
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge
  Distillation
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Yichong Leng
Xu Tan
Bihui Yu
Ruifeng Guo
90
4
0
23 Apr 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated
  Vehicle Routing Problem
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
59
0
0
28 Feb 2024
Unraveling Key Factors of Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Xu Tan
Bihui Yu
Ruifeng Guo
32
0
0
14 Dec 2023
Large Learning Rates Improve Generalization: But How Large Are We
  Talking About?
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
33
0
0
19 Nov 2023
No Data Augmentation? Alternative Regularizations for Effective Training
  on Small Datasets
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
Stavroula Mougiakakou
77
5
0
04 Sep 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
105
45
0
12 Jul 2023
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
53
11
0
20 Sep 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in
  Three Regimes
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
105
17
0
08 Sep 2022
Distance Learner: Incorporating Manifold Prior to Model Training
Distance Learner: Incorporating Manifold Prior to Model Training
Aditya Chetan
Nipun Kwatra
31
1
0
14 Jul 2022
Efficient Multi-Purpose Cross-Attention Based Image Alignment Block for
  Edge Devices
Efficient Multi-Purpose Cross-Attention Based Image Alignment Block for Edge Devices
Bahri Batuhan Bilecen
Alparslan Fisne
Mustafa Ayazoglu
65
2
0
01 Jun 2022
IMDeception: Grouped Information Distilling Super-Resolution Network
IMDeception: Grouped Information Distilling Super-Resolution Network
Mustafa Ayazoglu
88
5
0
25 Apr 2022
Ranger21: a synergistic deep learning optimizer
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODLAI4CE
104
87
0
25 Jun 2021
LRTuner: A Learning Rate Tuner for Deep Neural Networks
LRTuner: A Learning Rate Tuner for Deep Neural Networks
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
ODL
45
1
0
30 May 2021
Understanding Decoupled and Early Weight Decay
Understanding Decoupled and Early Weight Decay
Johan Bjorck
Kilian Q. Weinberger
Carla P. Gomes
61
25
0
27 Dec 2020
BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint
  Attention
BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention
Zhebin Zhang
Sai Wu
Dawei Jiang
Gang Chen
46
0
0
09 Nov 2020
1