Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,653 papers shown
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Tom Sander
Maxime Sylvestre
Alain Durmus
209
3
0
13 Feb 2024
Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors
D. Sahabandu
Xiaojun Xu
Arezoo Rajabi
Luyao Niu
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
AAML
208
1
0
12 Feb 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
223
2
0
07 Feb 2024
Strong convexity-guided hyper-parameter optimization for flatter losses
Rahul Yedida
Snehanshu Saha
327
0
0
07 Feb 2024
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Omead Brandon Pooladzandi
Xi-Lin Li
245
10
0
07 Feb 2024
Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation
Ossi Raisa
Hibiki Ito
Antti Honkela
249
8
0
06 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
International Conference on Machine Learning (ICML), 2024
Artem Vysogorets
Anna Dawid
Julia Kempe
249
3
0
05 Feb 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Naoki Sato
Hideaki Iiduka
ODL
453
1
0
04 Feb 2024
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
International Journal of Computer Vision (IJCV), 2024
Baoyuan Wu
Hongrui Chen
Ruotong Wang
Zihao Zhu
Shaokui Wei
Danni Yuan
Mingli Zhu
Ke Xu
Li Liu
Chaoxiao Shen
AAML
ELM
281
19
0
26 Jan 2024
Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN
AAAI Conference on Artificial Intelligence (AAAI), 2024
Minsoo Kang
Minkoo Kang
Suhyun Kim
129
7
0
24 Jan 2024
DALex: Lexicase-like Selection via Diverse Aggregation
European Conference on Genetic Programming (EuroGP), 2024
Andrew Ni
Lijie Ding
Lee Spector
260
8
0
23 Jan 2024
A Precise Characterization of SGD Stability Using Loss Surface Geometry
International Conference on Learning Representations (ICLR), 2024
Gregory Dexter
Borja Ocejo
S. Keerthi
Aman Gupta
Ayan Acharya
Rajiv Khanna
MLT
249
1
0
22 Jan 2024
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Leonardo Castro-Gonzalez
Yi-Ling Chung
Hannak Rose Kirk
John Francis
Angus R. Williams
Pica Johansson
Jonathan Bright
284
2
0
22 Jan 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
500
10
0
22 Jan 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
268
8
0
21 Jan 2024
The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness
Yifan Hao
Tong Zhang
AAML
508
6
0
19 Jan 2024
Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach
David Fleischhacker
Wolfgang Goederle
Roman Kern
109
6
0
15 Jan 2024
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Chengli Tan
Jiangshe Zhang
Junmin Liu
Yicheng Wang
Yunda Hao
AAML
320
5
0
14 Jan 2024
EsaCL: Efficient Continual Learning of Sparse Models
SDM (SDM), 2024
Weijieying Ren
V. Honavar
CLL
198
4
0
11 Jan 2024
Standardizing Your Training Process for Human Activity Recognition Models: A Comprehensive Review in the Tunable Factors
International Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous), 2024
Yiran Huang
Hai-qiang Zhao
Yexu Zhou
T. Riedel
Michael Beigl
123
3
0
10 Jan 2024
Preserving Silent Features for Domain Generalization
Chujie Zhao
Tianren Zhang
Feng Chen
277
0
0
06 Jan 2024
Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient Accumulation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xuannan Liu
Yaoyao Zhong
Weihong Deng
Hongzhi Shi
Xingchen Cui
Yunfeng Yin
Dongchao Wen
PICV
FedML
181
2
0
03 Jan 2024
f
f
f
-Divergence Based Classification: Beyond the Use of Cross-Entropy
International Conference on Machine Learning (ICML), 2024
Nicola Novello
Andrea M. Tonello
316
17
0
02 Jan 2024
Hidden Minima in Two-Layer ReLU Networks
Yossi Arjevani
356
3
0
28 Dec 2023
Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing
Raffaele Marino
L. Buffoni
Lorenzo Chicchi
Lorenzo Giambagli
Duccio Fanelli
338
1
0
22 Dec 2023
CR-SAM: Curvature Regularized Sharpness-Aware Minimization
Tao Wu
Tie Luo
D. C. Wunsch
228
11
0
21 Dec 2023
Enhancing Neural Training via a Correlated Dynamics Model
Jonathan Brokman
Roy Betser
Rotem Turjeman
Tom Berkov
I. Cohen
Guy Gilboa
177
5
0
20 Dec 2023
LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate
Tao Wu
Tie Luo
D. C. Wunsch
256
7
0
20 Dec 2023
Doubly Perturbed Task Free Continual Learning
Byung Hyun Lee
Min-hwan Oh
Se Young Chun
344
5
0
20 Dec 2023
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Weixi Song
Z. Li
Lefei Zhang
Hai Zhao
Bo Du
VLM
377
12
0
19 Dec 2023
Mixture-of-Linear-Experts for Long-term Time Series Forecasting
Ronghao Ni
Zinan Lin
Shuaiqi Wang
Giulia Fanti
AI4TS
276
42
0
11 Dec 2023
PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition
Md Zarif Ul Alam
Md. Saiful Islam
Ehsan Hoque
M. S. Rahman
OOD
196
1
0
10 Dec 2023
Cross Domain Generative Augmentation: Domain Generalization with Latent Diffusion Models
S. Hemati
Mahdi Beitollahi
A. Estiri
Bassel Al Omari
Xi Chen
Guojun Zhang
175
9
0
08 Dec 2023
Simplifying Neural Network Training Under Class Imbalance
Neural Information Processing Systems (NeurIPS), 2023
Ravid Shwartz-Ziv
Micah Goldblum
Yucen Lily Li
C. Bayan Bruss
Andrew Gordon Wilson
275
32
0
05 Dec 2023
Optimal Sample Complexity of Contrastive Learning
International Conference on Learning Representations (ICLR), 2023
Noga Alon
Dmitrii Avdiukhin
Dor Elboim
Orr Fischer
G. Yaroslavtsev
SSL
291
11
0
01 Dec 2023
Directions of Curvature as an Explanation for Loss of Plasticity
Alex Lewandowski
Haruto Tanaka
Dale Schuurmans
Marlos C. Machado
453
16
0
30 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
802
2
0
29 Nov 2023
Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing
IEEE Transactions on Communications (IEEE Trans. Commun.), 2023
Zhengming Zhang
Yongming Huang
Cheng Zhang
Qingbi Zheng
Luxi Yang
Xiaohu You
271
39
0
28 Nov 2023
MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated Learning
International Conference on Computing, Networking and Communications (ICNC), 2023
Soumya Banerjee
Sandip Roy
Sayyed Farid Ahamed
Devin Quinn
Marc Vucovich
Dhruv Nandakumar
K. Choi
Abdul Rahman
Edward Bowen
Sachin Shetty
255
10
0
28 Nov 2023
Should We Learn Most Likely Functions or Parameters?
Neural Information Processing Systems (NeurIPS), 2023
Shikai Qiu
Tim G. J. Rudner
Sanyam Kapoor
Andrew Gordon Wilson
142
11
0
27 Nov 2023
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
International Conference on Machine Learning (ICML), 2023
Mingze Wang
Zeping Min
Lei Wu
491
3
0
24 Nov 2023
SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape
Hua Zheng
Kuang-Hung Liu
Igor Fedorov
Xin Zhang
Wen-Yen Chen
Wei Wen
301
2
0
22 Nov 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Computer Vision and Pattern Recognition (CVPR), 2023
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Qiufeng Wang
350
30
0
22 Nov 2023
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection
Ahmed Sharshar
Aleksandr Matsun
198
5
0
21 Nov 2023
Generalization Bounds for Robust Contrastive Learning: From Theory to Practice
Ngoc N. Tran
Lam C. Tran
Hoang Phan
Anh-Vu Bui
Tung Pham
Toan M. Tran
Dinh Q. Phung
Trung Le
SSL
NoLa
384
0
0
16 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
394
4
0
15 Nov 2023
A PAC-Bayesian Perspective on the Interpolating Information Criterion
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
275
2
0
13 Nov 2023
Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment
Matt Gorbett
Hossein Shirazi
Indrakshi Ray
FedML
425
2
0
08 Nov 2023
EControl: Fast Distributed Optimization with Compression and Error Control
International Conference on Learning Representations (ICLR), 2023
Yuan Gao
Rustem Islamov
Sebastian U. Stich
261
17
0
06 Nov 2023
The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning
Neural Information Processing Systems (NeurIPS), 2023
Artyom Gadetsky
Maria Brbić
274
9
0
06 Nov 2023
Previous
1
2
3
...
7
8
9
...
32
33
34
Next
Page 8 of 34
Page
of 34
Go