Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,653 papers shown
DataFreeShield: Defending Adversarial Attacks without Training Data
Hyeyoon Lee
Kanghyun Choi
Dain Kwon
Sunjong Park
Mayoore S. Jaiswal
Noseong Park
Jonghyun Choi
Jinho Lee
268
1
0
21 Jun 2024
Flat Posterior Does Matter For Bayesian Model Averaging
Sungjun Lim
Jeyoon Yeom
Sooyon Kim
Hoyoon Byun
Jinho Kang
Yohan Jung
Jiyoung Jung
Kyungwoo Song
BDL
AAML
808
0
0
21 Jun 2024
Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
Tanapat Ratchatorn
Masayuki Tanaka
AAML
282
1
0
20 Jun 2024
Information Guided Regularization for Fine-tuning Language Models
Mandar Sharma
Nikhil Muralidhar
Shengzhe Xu
Raquib Bin Yousuf
Naren Ramakrishnan
293
0
0
20 Jun 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
310
3
0
20 Jun 2024
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection
Zhuoxiao Chen
Zixin Wang
Yadan Luo
Sen Wang
Zi Huang
AAML
3DPC
211
3
0
19 Jun 2024
Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Viktor Moskvoretskii
N. Tupitsa
Chris Biemann
Samuel Horváth
Eduard A. Gorbunov
Irina Nikishina
FedML
189
1
0
18 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
281
2
0
17 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
309
6
0
14 Jun 2024
When Will Gradient Regularization Be Harmful?
International Conference on Machine Learning (ICML), 2024
Yang Zhao
Hao Zhang
Xiuyuan Hu
AI4CE
149
2
0
14 Jun 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
348
12
0
12 Jun 2024
Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation
Shuyu Yin
Fei Wen
Peilin Liu
Tao Luo
278
0
0
12 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
489
4
0
12 Jun 2024
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
397
1
0
11 Jun 2024
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao
Kaiqi Zhang
Esha Singh
Daniel Soudry
Yu-Xiang Wang
NoLa
295
7
0
10 Jun 2024
Revisiting Catastrophic Forgetting in Large Language Model Tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Hongyu Li
Liang Ding
Meng Fang
Dacheng Tao
CLL
KELM
216
58
0
07 Jun 2024
Error Bounds of Supervised Classification from Information-Theoretic Perspective
Binchuan Qi
Wei Gong
Li Li
283
0
0
07 Jun 2024
Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection
Yinting Wu
Pai Peng
Bo Cai
Le Li
.
AAML
248
0
0
06 Jun 2024
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus
Maksim Kuznetsov
Roman Schutski
Rim Shayakhmetov
Daniil Polykovskiy
Sarath Chandar
Alex Zhavoronkov
DiffM
AI4CE
216
15
0
06 Jun 2024
A Universal Class of Sharpness-Aware Minimization Algorithms
B. Tahmasebi
Ashkan Soleymani
Dara Bahri
Stefanie Jegelka
Patrick Jaillet
AAML
384
10
0
06 Jun 2024
Cyclic Sparse Training: Is it Enough?
Advait Gadhikar
Sree Harsha Nelaturu
R. Burkholz
CLL
449
1
0
04 Jun 2024
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho
Yoshihiro Sakai
Kenshiro Tanaka
Mariko Kato
Naoya Inoue
296
3
0
03 Jun 2024
Mixup Augmentation with Multiple Interpolations
Lifeng Shen
Jincheng Yu
Hansi Yang
James T. Kwok
335
0
0
03 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
269
12
0
31 May 2024
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Jacob Mitchell Springer
Vaishnavh Nagarajan
Aditi Raghunathan
351
11
0
30 May 2024
Near Optimal Decentralized Optimization with Compression and Momentum Tracking
Rustem Islamov
Yuan Gao
Sebastian U. Stich
240
0
0
30 May 2024
Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization
Ziqing Fan
Shengchao Hu
Jiangchao Yao
Gang Niu
Ya Zhang
Masashi Sugiyama
Yanfeng Wang
FedML
292
31
0
29 May 2024
Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
Ruipeng Zhang
Ziqing Fan
Jiangchao Yao
Ya Zhang
Yanfeng Wang
276
9
0
29 May 2024
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
Joonhyung Lee
Jeongin Bae
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
222
1
0
29 May 2024
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Sung Ju Hwang
ViT
232
1
0
28 May 2024
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei
Di Hu
303
62
0
28 May 2024
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Sheng-Hsuan Peng
Pin-Yu Chen
Matthew Hull
Duen Horng Chau
339
51
0
27 May 2024
MCGAN: Enhancing GAN Training with Regression-Based Generator Loss
Baoren Xiao
Hao Ni
Weixin Yang
GAN
611
3
0
27 May 2024
The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective
Nils Philipp Walter
Linara Adilova
Jilles Vreeken
Michael Kamp
AAML
256
2
0
27 May 2024
Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency
Runqi Lin
Chaojian Yu
Bo Han
Hang Su
Tongliang Liu
AAML
396
5
0
25 May 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
510
17
1
25 May 2024
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Michael Munn
Benoit Dherin
Javier Gonzalvo
AAML
278
5
0
24 May 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Neural Information Processing Systems (NeurIPS), 2024
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
...
Zheng Fang
Jinbao Xue
Yangyu Tao
Tengjiao Wang
Di Wang
296
25
0
23 May 2024
Worldwide Federated Training of Language Models
Alexandru Iacob
Lorenzo Sani
Bill Marino
Preslav Aleksandrov
William F. Shen
Nicholas D. Lane
FedML
358
6
0
23 May 2024
Improving Generalization of Deep Neural Networks by Optimum Shifting
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yuyan Zhou
Ye Li
Lei Feng
Sheng-Jun Huang
OOD
ODL
178
0
0
23 May 2024
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
307
13
0
22 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
363
1
0
22 May 2024
Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks
Xin-Chun Li
Jinli Tang
Bo Zhang
Lan Li
De-Chuan Zhan
308
2
0
21 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Yichu Xu
Xin-Chun Li
Lan Li
De-Chuan Zhan
344
2
0
21 May 2024
Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features
Junpeng Zhang
Qing Li
Liang Lin
Quanshi Zhang
AI4CE
364
6
0
16 May 2024
MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual Learning
IEEE International Joint Conference on Neural Network (IJCNN), 2024
Xingyu Li
Bo Tang
VLM
CLL
174
0
0
15 May 2024
Why is SAM Robust to Label Noise?
International Conference on Learning Representations (ICLR), 2024
Christina Baek
Zico Kolter
Aditi Raghunathan
NoLa
AAML
310
20
0
06 May 2024
Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Communications in Computational Physics (Commun. Comput. Phys.), 2024
Zhiwei Wang
Lulu Zhang
Zhongwang Zhang
Z. Xu
201
2
0
06 May 2024
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
349
5
0
02 May 2024
PackVFL: Efficient HE Packing for Vertical Federated Learning
Liu Yang
Shuowei Cai
Di Chai
Junxue Zhang
Han Tian
Yilun Jin
Kun Guo
Kai Chen
Qiang Yang
FedML
226
1
0
01 May 2024
Previous
1
2
3
...
5
6
7
...
32
33
34
Next
Page 6 of 34
Page
of 34
Go