Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
Jie You
Jaehoon Chung
Mosharaf Chowdhury
80
82
0
12 Aug 2022
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace
Yucong Liu
Shixing Yu
Tong Lin
48
1
0
11 Aug 2022
A Container-Based Workflow for Distributed Training of Deep Learning Algorithms in HPC Clusters
Jose González-Abad
Álvaro López García
Valentin Kozlov
39
6
0
04 Aug 2022
Learning Hyper Label Model for Programmatic Weak Supervision
Renzhi Wu
Sheng Chen
Jieyu Zhang
Xu Chu
112
17
0
27 Jul 2022
LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity
Martin Gubri
Maxime Cordy
Mike Papadakis
Yves Le Traon
Koushik Sen
AAML
75
55
0
26 Jul 2022
On the benefits of non-linear weight updates
Paul Norridge
48
0
0
25 Jul 2022
Deep Laparoscopic Stereo Matching with Transformers
Xuelian Cheng
Yiran Zhong
Mehrtash Harandi
Tom Drummond
Zhiyong Wang
Zongyuan Ge
ViT
MedIm
74
15
0
25 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
95
8
0
19 Jul 2022
PointNorm: Dual Normalization is All You Need for Point Cloud Analysis
Shen Zheng
Jinqian Pan
Chang-Tien Lu
Gaurav Gupta
3DPC
86
7
0
13 Jul 2022
Implicit regularization of dropout
Zhongwang Zhang
Zhi-Qin John Xu
70
29
0
13 Jul 2022
Towards understanding how momentum improves generalization in deep learning
Samy Jelassi
Yuanzhi Li
ODL
MLT
AI4CE
90
38
0
13 Jul 2022
PAC-Bayesian Domain Adaptation Bounds for Multiclass Learners
Anthony Sicilia
Katherine Atwell
Malihe Alikhani
Seong Jae Hwang
BDL
93
10
0
12 Jul 2022
The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu
Mingze Wang
Weijie Su
MLT
101
34
0
06 Jul 2022
An Empirical Study of Implicit Regularization in Deep Offline RL
Çağlar Gülçehre
Srivatsan Srinivasan
Jakub Sygnowski
Georg Ostrovski
Mehrdad Farajtabar
Matt Hoffman
Razvan Pascanu
Arnaud Doucet
OffRL
88
17
0
05 Jul 2022
Predicting Out-of-Domain Generalization with Neighborhood Invariance
Nathan Ng
Neha Hulkund
Kyunghyun Cho
Marzyeh Ghassemi
OOD
52
5
0
05 Jul 2022
Federated Self-supervised Learning for Video Understanding
Yasar Abbas Ur Rehman
Yan Gao
Jiajun Shen
Pedro Porto Buarque de Gusmão
Nicholas D. Lane
FedML
75
15
0
05 Jul 2022
PoF: Post-Training of Feature Extractor for Improving Generalization
Ikuro Sato
Ryota Yamada
Masayuki Tanaka
Nakamasa Inoue
Rei Kawakami
37
4
0
05 Jul 2022
Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
Edoardo Cetin
Philip J. Ball
Steve Roberts
Oya Celiktutan
112
38
0
03 Jul 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
Shaoshuai Shi
Wei Wang
Yue Liu
65
10
0
30 Jun 2022
On the Maximum Hessian Eigenvalue and Generalization
Simran Kaur
Jérémy E. Cohen
Zachary Chase Lipton
101
43
0
21 Jun 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
89
34
0
20 Jun 2022
Disentangling Model Multiplicity in Deep Learning
Ari Heljakka
Martin Trapp
Arno Solin
Arno Solin
65
4
0
17 Jun 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Zhengqi He
Zeke Xie
Quanzhi Zhu
Zengchang Qin
136
28
0
17 Jun 2022
How You Start Matters for Generalization
Sameera Ramasinghe
L. MacDonald
M. Farazi
Hemanth Saratchandran
Simon Lucey
ODL
AI4CE
89
6
0
17 Jun 2022
Revisiting Self-Distillation
M. Pham
Minsu Cho
Ameya Joshi
Chinmay Hegde
101
23
0
17 Jun 2022
Methods for Estimating and Improving Robustness of Language Models
Michal Stefánik
52
3
0
16 Jun 2022
A Closer Look at Smoothness in Domain Adversarial Training
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
Arihant Jain
R. Venkatesh Babu
116
122
0
16 Jun 2022
Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Courtney Paquette
Elliot Paquette
Ben Adlam
Jeffrey Pennington
63
14
0
15 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
128
165
0
14 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
121
75
0
14 Jun 2022
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Gaoyuan Zhang
Songtao Lu
Yihua Zhang
Xiangyi Chen
Pin-Yu Chen
Quanfu Fan
Lee Martie
L. Horesh
Min-Fong Hong
Sijia Liu
OOD
73
12
0
13 Jun 2022
Towards Understanding Sharpness-Aware Minimization
Maksym Andriushchenko
Nicolas Flammarion
AAML
115
142
0
13 Jun 2022
Modeling the Machine Learning Multiverse
Samuel J. Bell
Onno P. Kampman
Jesse Dodge
Neil D. Lawrence
78
18
0
13 Jun 2022
Fisher SAM: Information Geometry and Sharpness Aware Minimisation
Minyoung Kim
Da Li
S. Hu
Timothy M. Hospedales
AAML
87
72
0
10 Jun 2022
Data-Efficient Double-Win Lottery Tickets from Robust Pre-training
Tianlong Chen
Zhenyu Zhang
Sijia Liu
Yang Zhang
Shiyu Chang
Zhangyang Wang
AAML
74
8
0
09 Jun 2022
Explicit Regularization in Overparametrized Models via Noise Injection
Antonio Orvieto
Anant Raj
Hans Kersting
Francis R. Bach
73
27
0
09 Jun 2022
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion
Chengli Tan
Jiang Zhang
Junmin Liu
78
1
0
09 Jun 2022
Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning
Momin Abbas
Quan-Wu Xiao
Lisha Chen
Pin-Yu Chen
Tianyi Chen
111
84
0
08 Jun 2022
Generalized Federated Learning via Sharpness Aware Minimization
Zhe Qu
Xingyu Li
Rui Duan
Yaojiang Liu
Bo Tang
Zhuo Lu
FedML
108
142
0
06 Jun 2022
Two Decades of Bengali Handwritten Digit Recognition: A Survey
A. A. Ashikur Rahman
Md. Bakhtiar Hasan
Sabbir Ahmed
Tasnim Ahmed
Md. Hamjajul Ashmafee
Mohammad Ridwan Kabir
M. H. Kabir
80
26
0
05 Jun 2022
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViT
OOD
MedIm
169
43
0
02 Jun 2022
Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules
Yuhan Helena Liu
Arna Ghosh
Blake A. Richards
E. Shea-Brown
Guillaume Lajoie
88
10
0
02 Jun 2022
The Phenomenon of Policy Churn
Tom Schaul
André Barreto
John Quan
Georg Ostrovski
89
28
0
01 Jun 2022
Special Properties of Gradient Descent with Large Learning Rates
Amirkeivan Mohtashami
Martin Jaggi
Sebastian U. Stich
MLT
103
9
0
30 May 2022
Metrizing Fairness
Yves Rychener
Bahar Taşkesen
Daniel Kuhn
FaML
64
4
0
30 May 2022
Sharpness-Aware Training for Free
Jiawei Du
Daquan Zhou
Jiashi Feng
Vincent Y. F. Tan
Qiufeng Wang
AAML
103
96
0
27 May 2022
Gaussian Universality of Perceptrons with Random Labels
Federica Gerace
Florent Krzakala
Bruno Loureiro
Ludovic Stephan
Lenka Zdeborová
104
24
0
26 May 2022
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Zhiwei Bai
Yaoyu Zhang
Z. Xu
Yaoyu Zhang
122
6
0
26 May 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
106
20
0
25 May 2022
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
327
48
0
24 May 2022
Previous
1
2
3
...
12
13
14
...
30
31
32
Next