Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
On the Trade-off between Flatness and Optimization in Distributed Learning
Ying Cao
Zhaoxian Wu
Kun Yuan
Ali H. Sayed
103
1
0
28 Jun 2024
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
81
14
0
26 Jun 2024
MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation
Liuyi Wang
Zongtao He
Mengjiao Shen
Jingwei Yang
Chengju Liu
Qijun Chen
VLM
108
2
0
25 Jun 2024
Improving robustness to corruptions with multiplicative weight perturbations
Trung Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski
76
0
0
24 Jun 2024
MD tree: a model-diagnostic tree grown on loss landscape
Yefan Zhou
Jianlong Chen
Qinxue Cao
Konstantin Schürholt
Yaoqing Yang
90
2
0
24 Jun 2024
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
Naoki Yoshida
Shogo H. Nakakita
Masaaki Imaizumi
57
1
0
23 Jun 2024
DataFreeShield: Defending Adversarial Attacks without Training Data
Hyeyoon Lee
Kanghyun Choi
Dain Kwon
Sunjong Park
Mayoore S. Jaiswal
Noseong Park
Jonghyun Choi
Jinho Lee
78
0
0
21 Jun 2024
Flat Posterior Does Matter For Bayesian Model Averaging
Sungjun Lim
Jeyoon Yeom
Sooyon Kim
Hoyoon Byun
Jinho Kang
Yohan Jung
Jiyoung Jung
Kyungwoo Song
BDL
AAML
101
0
0
21 Jun 2024
Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
Tanapat Ratchatorn
Masayuki Tanaka
AAML
106
1
0
20 Jun 2024
Information Guided Regularization for Fine-tuning Language Models
Mandar Sharma
Nikhil Muralidhar
Shengzhe Xu
Raquib Bin Yousuf
Naren Ramakrishnan
97
0
0
20 Jun 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
84
1
0
20 Jun 2024
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection
Zhuoxiao Chen
Zixin Wang
Yadan Luo
Sen Wang
Zi Huang
AAML
3DPC
46
1
0
19 Jun 2024
Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Viktor Moskvoretskii
N. Tupitsa
Chris Biemann
Samuel Horváth
Eduard A. Gorbunov
Irina Nikishina
FedML
87
0
0
18 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
56
1
0
17 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
84
1
0
14 Jun 2024
When Will Gradient Regularization Be Harmful?
Yang Zhao
Hao Zhang
Xiuyuan Hu
AI4CE
65
1
0
14 Jun 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
84
4
0
12 Jun 2024
Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation
Shuyu Yin
Fei Wen
Peilin Liu
Tao Luo
70
0
0
12 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
124
1
0
12 Jun 2024
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
98
0
0
11 Jun 2024
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao
Kaiqi Zhang
Esha Singh
Daniel Soudry
Yu-Xiang Wang
NoLa
84
4
0
10 Jun 2024
Revisiting Catastrophic Forgetting in Large Language Model Tuning
Hongyu Li
Liang Ding
Meng Fang
Dacheng Tao
CLL
KELM
84
19
0
07 Jun 2024
Error Bounds of Supervised Classification from Information-Theoretic Perspective
Binchuan Qi
Wei Gong
Li Li
51
0
0
07 Jun 2024
Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection
Yinting Wu
Pai Peng
Bo Cai
Le Li
.
AAML
61
0
0
06 Jun 2024
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus
Maksim Kuznetsov
Roman Schutski
Rim Shayakhmetov
Daniil Polykovskiy
Sarath Chandar
Alex Zhavoronkov
DiffM
AI4CE
74
6
0
06 Jun 2024
A Universal Class of Sharpness-Aware Minimization Algorithms
B. Tahmasebi
Ashkan Soleymani
Dara Bahri
Stefanie Jegelka
Patrick Jaillet
AAML
81
3
0
06 Jun 2024
Cyclic Sparse Training: Is it Enough?
Advait Gadhikar
Sree Harsha Nelaturu
R. Burkholz
CLL
91
0
0
04 Jun 2024
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho
Yoshihiro Sakai
Kenshiro Tanaka
Mariko Kato
Naoya Inoue
74
2
0
03 Jun 2024
Mixup Augmentation with Multiple Interpolations
Lifeng Shen
Jincheng Yu
Hansi Yang
James T. Kwok
65
0
0
03 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
94
8
0
31 May 2024
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Jacob Mitchell Springer
Vaishnavh Nagarajan
Aditi Raghunathan
114
6
0
30 May 2024
Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization
Ziqing Fan
Shengchao Hu
Jiangchao Yao
Gang Niu
Ya Zhang
Masashi Sugiyama
Yanfeng Wang
FedML
98
13
0
29 May 2024
Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
Ruipeng Zhang
Ziqing Fan
Jiangchao Yao
Ya Zhang
Yanfeng Wang
74
7
0
29 May 2024
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
Joonhyung Lee
Jeongin Bae
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
76
1
0
29 May 2024
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Sung Ju Hwang
ViT
61
1
0
28 May 2024
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei
Di Hu
90
16
0
28 May 2024
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Sheng-Hsuan Peng
Pin-Yu Chen
Matthew Hull
Duen Horng Chau
98
30
0
27 May 2024
MCGAN: Enhancing GAN Training with Regression-Based Generator Loss
Baoren Xiao
Hao Ni
Weixin Yang
GAN
144
0
0
27 May 2024
The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective
Nils Philipp Walter
Linara Adilova
Jilles Vreeken
Michael Kamp
AAML
106
2
0
27 May 2024
Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency
Runqi Lin
Chaojian Yu
Bo Han
Hang Su
Tongliang Liu
AAML
117
4
0
25 May 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
116
7
1
25 May 2024
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Michael Munn
Benoit Dherin
Javier Gonzalvo
AAML
68
2
0
24 May 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
...
Zheng Fang
Jinbao Xue
Yangyu Tao
Tengjiao Wang
Di Wang
80
9
0
23 May 2024
Worldwide Federated Training of Language Models
Alexandru Iacob
Lorenzo Sani
Bill Marino
Preslav Aleksandrov
William F. Shen
Nicholas D. Lane
FedML
79
2
0
23 May 2024
Improving Generalization of Deep Neural Networks by Optimum Shifting
Yuyan Zhou
Ye Li
Lei Feng
Sheng-Jun Huang
OOD
ODL
55
0
0
23 May 2024
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
104
6
0
22 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
94
0
0
22 May 2024
Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks
Xin-Chun Li
Jinli Tang
Bo Zhang
Lan Li
De-Chuan Zhan
76
2
0
21 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Yichu Xu
Xin-Chun Li
Lan Li
De-Chuan Zhan
80
2
0
21 May 2024
Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features
Junpeng Zhang
Qing Li
Liang Lin
Quanshi Zhang
AI4CE
125
5
0
16 May 2024
Previous
1
2
3
4
5
...
30
31
32
Next