Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Scaling physics-informed hard constraints with mixture-of-experts
N. Chalapathi
Yiheng Du
Aditi Krishnapriyan
AI4CE
94
16
0
20 Feb 2024
OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations
Yao Shu
Jiongfeng Fang
Y. He
Fei Richard Yu
61
0
0
18 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
70
6
0
17 Feb 2024
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert
Ambroise Odonnat
Vasilii Feofanov
Aladin Virmaux
Giuseppe Paolo
Themis Palpanas
I. Redko
AI4TS
93
30
0
15 Feb 2024
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Tom Sander
Maxime Sylvestre
Alain Durmus
60
1
0
13 Feb 2024
Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors
D. Sahabandu
Xiaojun Xu
Arezoo Rajabi
Luyao Niu
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
AAML
68
1
0
12 Feb 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
81
2
0
07 Feb 2024
Strong convexity-guided hyper-parameter optimization for flatter losses
Rahul Yedida
Snehanshu Saha
97
0
0
07 Feb 2024
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Omead Brandon Pooladzandi
Xi-Lin Li
83
8
0
07 Feb 2024
Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation
Ossi Raisa
Hibiki Ito
Antti Honkela
75
6
0
06 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
Artem Vysogorets
Anna Dawid
Julia Kempe
63
1
0
05 Feb 2024
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
Baoyuan Wu
Hongrui Chen
Ruotong Wang
Zihao Zhu
Shaokui Wei
Danni Yuan
Mingli Zhu
Ke Xu
Li Liu
Chaoxiao Shen
AAML
ELM
127
11
0
26 Jan 2024
Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN
Minsoo Kang
Minkoo Kang
Suhyun Kim
29
3
0
24 Jan 2024
DALex: Lexicase-like Selection via Diverse Aggregation
Andrew Ni
Lijie Ding
Lee Spector
94
6
0
23 Jan 2024
A Precise Characterization of SGD Stability Using Loss Surface Geometry
Gregory Dexter
Borja Ocejo
S. Keerthi
Aman Gupta
Ayan Acharya
Rajiv Khanna
MLT
73
0
0
22 Jan 2024
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Leonardo Castro-Gonzalez
Yi-Ling Chung
Hannak Rose Kirk
John Francis
Angus R. Williams
Pica Johansson
Jonathan Bright
69
1
0
22 Jan 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
157
6
0
22 Jan 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
70
6
0
21 Jan 2024
The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness
Yifan Hao
Tong Zhang
AAML
139
5
0
19 Jan 2024
Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach
David Fleischhacker
Wolfgang Goederle
Roman Kern
33
2
0
15 Jan 2024
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Chengli Tan
Jiangshe Zhang
Junmin Liu
Yicheng Wang
Yunda Hao
AAML
73
2
0
14 Jan 2024
EsaCL: Efficient Continual Learning of Sparse Models
Weijieying Ren
V. Honavar
CLL
50
3
0
11 Jan 2024
Standardizing Your Training Process for Human Activity Recognition Models: A Comprehensive Review in the Tunable Factors
Yiran Huang
Hai-qiang Zhao
Yexu Zhou
T. Riedel
Michael Beigl
43
2
0
10 Jan 2024
Preserving Silent Features for Domain Generalization
Chujie Zhao
Tianren Zhang
Feng Chen
85
0
0
06 Jan 2024
Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient Accumulation
Xuannan Liu
Yaoyao Zhong
Weihong Deng
Hongzhi Shi
Xingchen Cui
Yunfeng Yin
Dongchao Wen
PICV
FedML
65
1
0
03 Jan 2024
f
f
f
-Divergence Based Classification: Beyond the Use of Cross-Entropy
Nicola Novello
Andrea M. Tonello
72
8
0
02 Jan 2024
Hidden Minima in Two-Layer ReLU Networks
Yossi Arjevani
99
3
0
28 Dec 2023
Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing
Raffaele Marino
L. Buffoni
Lorenzo Chicchi
Lorenzo Giambagli
Duccio Fanelli
84
1
0
22 Dec 2023
CR-SAM: Curvature Regularized Sharpness-Aware Minimization
Tao Wu
Tie Luo
D. C. Wunsch
56
3
0
21 Dec 2023
Enhancing Neural Training via a Correlated Dynamics Model
Jonathan Brokman
Roy Betser
Rotem Turjeman
Tom Berkov
I. Cohen
Guy Gilboa
54
3
0
20 Dec 2023
LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate
Tao Wu
Tie Luo
D. C. Wunsch
74
6
0
20 Dec 2023
Doubly Perturbed Task Free Continual Learning
Byung Hyun Lee
Min-hwan Oh
Se Young Chun
75
3
0
20 Dec 2023
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Weixi Song
Z. Li
Lefei Zhang
Hai Zhao
Bo Du
VLM
67
8
0
19 Dec 2023
Mixture-of-Linear-Experts for Long-term Time Series Forecasting
Ronghao Ni
Zinan Lin
Shuaiqi Wang
Giulia Fanti
AI4TS
59
18
0
11 Dec 2023
PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition
Md Zarif Ul Alam
Md. Saiful Islam
Ehsan Hoque
M. S. Rahman
OOD
35
0
0
10 Dec 2023
Cross Domain Generative Augmentation: Domain Generalization with Latent Diffusion Models
S. Hemati
Mahdi Beitollahi
A. Estiri
Bassel Al Omari
Xi Chen
Guojun Zhang
64
7
0
08 Dec 2023
Simplifying Neural Network Training Under Class Imbalance
Ravid Shwartz-Ziv
Micah Goldblum
Yucen Lily Li
C. Bayan Bruss
Andrew Gordon Wilson
106
17
0
05 Dec 2023
Optimal Sample Complexity of Contrastive Learning
Noga Alon
Dmitrii Avdiukhin
Dor Elboim
Orr Fischer
G. Yaroslavtsev
SSL
66
7
0
01 Dec 2023
Directions of Curvature as an Explanation for Loss of Plasticity
Alex Lewandowski
Haruto Tanaka
Dale Schuurmans
Marlos C. Machado
82
7
0
30 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
156
2
0
29 Nov 2023
Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing
Zhengming Zhang
Yongming Huang
Cheng Zhang
Qingbi Zheng
Luxi Yang
Xiaohu You
61
14
0
28 Nov 2023
MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated Learning
Soumya Banerjee
Sandip Roy
Sayyed Farid Ahamed
Devin Quinn
Marc Vucovich
Dhruv Nandakumar
K. Choi
Abdul Rahman
Edward Bowen
Sachin Shetty
63
5
0
28 Nov 2023
Should We Learn Most Likely Functions or Parameters?
Shikai Qiu
Tim G. J. Rudner
Sanyam Kapoor
Andrew Gordon Wilson
38
6
0
27 Nov 2023
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Mingze Wang
Zeping Min
Lei Wu
84
3
0
24 Nov 2023
SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape
Hua Zheng
Kuang-Hung Liu
Igor Fedorov
Xin Zhang
Wen-Yen Chen
Wei Wen
86
2
0
22 Nov 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Qiufeng Wang
75
14
0
22 Nov 2023
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection
Ahmed Sharshar
Aleksandr Matsun
47
3
0
21 Nov 2023
Robust Contrastive Learning With Theory Guarantee
Ngoc N. Tran
Lam C. Tran
Hoang Phan
Anh-Vu Bui
Tung Pham
Toan M. Tran
Dinh Q. Phung
Trung Le
SSL
NoLa
68
0
0
16 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
63
3
0
15 Nov 2023
A PAC-Bayesian Perspective on the Interpolating Information Criterion
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
86
2
0
13 Nov 2023
Previous
1
2
3
...
5
6
7
...
30
31
32
Next