Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
All Papers
Title
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,585 papers shown
Title
Revisiting Catastrophic Forgetting in Large Language Model Tuning
Hongyu Li
Liang Ding
Meng Fang
Dacheng Tao
CLL
KELM
109
23
0
07 Jun 2024
Error Bounds of Supervised Classification from Information-Theoretic Perspective
Binchuan Qi
Wei Gong
Li Li
100
0
0
07 Jun 2024
Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection
Yinting Wu
Pai Peng
Bo Cai
Le Li
.
AAML
95
2
0
06 Jun 2024
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus
Maksim Kuznetsov
Roman Schutski
Rim Shayakhmetov
Daniil Polykovskiy
Sarath Chandar
Alex Zhavoronkov
DiffM
AI4CE
112
12
0
06 Jun 2024
A Universal Class of Sharpness-Aware Minimization Algorithms
B. Tahmasebi
Ashkan Soleymani
Dara Bahri
Stefanie Jegelka
Patrick Jaillet
AAML
149
4
0
06 Jun 2024
Cyclic Sparse Training: Is it Enough?
Advait Gadhikar
Sree Harsha Nelaturu
R. Burkholz
CLL
157
0
0
04 Jun 2024
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho
Yoshihiro Sakai
Kenshiro Tanaka
Mariko Kato
Naoya Inoue
125
3
0
03 Jun 2024
Mixup Augmentation with Multiple Interpolations
Lifeng Shen
Jincheng Yu
Hansi Yang
James T. Kwok
90
0
0
03 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
121
8
0
31 May 2024
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Jacob Mitchell Springer
Vaishnavh Nagarajan
Aditi Raghunathan
149
8
0
30 May 2024
Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization
Ziqing Fan
Shengchao Hu
Jiangchao Yao
Gang Niu
Ya Zhang
Masashi Sugiyama
Yanfeng Wang
FedML
120
17
0
29 May 2024
Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
Ruipeng Zhang
Ziqing Fan
Jiangchao Yao
Ya Zhang
Yanfeng Wang
98
7
0
29 May 2024
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
Joonhyung Lee
Jeongin Bae
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
101
2
0
29 May 2024
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Sung Ju Hwang
ViT
88
1
0
28 May 2024
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei
Di Hu
123
27
0
28 May 2024
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Sheng-Hsuan Peng
Pin-Yu Chen
Matthew Hull
Duen Horng Chau
163
35
0
27 May 2024
MCGAN: Enhancing GAN Training with Regression-Based Generator Loss
Baoren Xiao
Hao Ni
Weixin Yang
GAN
318
1
0
27 May 2024
The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective
Nils Philipp Walter
Linara Adilova
Jilles Vreeken
Michael Kamp
AAML
187
2
0
27 May 2024
Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency
Runqi Lin
Chaojian Yu
Bo Han
Hang Su
Tongliang Liu
AAML
147
4
0
25 May 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
302
9
1
25 May 2024
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Michael Munn
Benoit Dherin
Javier Gonzalvo
AAML
127
3
0
24 May 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
...
Zheng Fang
Jinbao Xue
Yangyu Tao
Tengjiao Wang
Di Wang
128
13
0
23 May 2024
Worldwide Federated Training of Language Models
Alexandru Iacob
Lorenzo Sani
Bill Marino
Preslav Aleksandrov
William F. Shen
Nicholas D. Lane
FedML
133
3
0
23 May 2024
Improving Generalization of Deep Neural Networks by Optimum Shifting
Yuyan Zhou
Ye Li
Lei Feng
Sheng-Jun Huang
OOD
ODL
74
0
0
23 May 2024
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
138
9
0
22 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
130
0
0
22 May 2024
Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks
Xin-Chun Li
Jinli Tang
Bo Zhang
Lan Li
De-Chuan Zhan
112
2
0
21 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Yichu Xu
Xin-Chun Li
Lan Li
De-Chuan Zhan
150
2
0
21 May 2024
Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features
Junpeng Zhang
Qing Li
Liang Lin
Quanshi Zhang
AI4CE
154
5
0
16 May 2024
MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual Learning
Xingyu Li
Bo Tang
VLM
CLL
105
0
0
15 May 2024
Why is SAM Robust to Label Noise?
Christina Baek
Zico Kolter
Aditi Raghunathan
NoLa
AAML
138
15
0
06 May 2024
Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Zhiwei Wang
Lulu Zhang
Zhongwang Zhang
Z. Xu
95
1
0
06 May 2024
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
136
4
0
02 May 2024
PackVFL: Efficient HE Packing for Vertical Federated Learning
Liu Yang
Shuowei Cai
Di Chai
Junxue Zhang
Han Tian
Yilun Jin
Kun Guo
Kai Chen
Qiang Yang
FedML
98
1
0
01 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
305
2
0
30 Apr 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
110
0
0
25 Apr 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
104
0
0
24 Apr 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN
3DPC
101
0
0
19 Apr 2024
Singular-limit analysis of gradient descent with noise injection
Anna Shalova
André Schlichting
M. Peletier
92
2
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
123
2
0
17 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Rui Li
Martin Trapp
Talal Alrawajfeh
Arno Solin
163
0
0
11 Apr 2024
Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity
Zhanran Lin
Puheng Li
Lei Wu
299
9
0
09 Apr 2024
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
107
0
0
05 Apr 2024
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Mohammed Ghaith Altarabichi
Sławomir Nowaczyk
Sepideh Pashami
Peyman Sheikholharam Mashhadi
Julia Handl
58
16
0
05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
216
8
0
04 Apr 2024
Make Continual Learning Stronger via C-Flat
Ang Bian
Wei Li
Hangjie Yuan
Chengrong Yu
Zixiang Zhao
Mang Wang
Aojun Lu
Tao Feng
108
15
0
01 Apr 2024
Revisiting Random Weight Perturbation for Efficiently Improving Generalization
Tao Li
Qinghua Tao
Weihao Yan
Zehao Lei
Yingwen Wu
Kun Fang
Mingzhen He
Xiaolin Huang
AAML
157
7
0
30 Mar 2024
Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
Tuan Nguyen
C. Fredouille
A. Ghio
M. Balaguer
Virginie Woisard
55
3
0
29 Mar 2024
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang
Sangdoo Yun
Dongyoon Han
OODD
MoMe
164
54
0
28 Mar 2024
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao
Yong Lin
Difan Zou
Tong Zhang
OODD
OOD
128
6
0
26 Mar 2024
Previous
1
2
3
4
5
6
...
30
31
32
Next