ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Use of Transformer-Based Models for Word-Level Transliteration of the
  Book of the Dean of Lismore
Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore
Edward Gow-Smith
Mark McConville
W. Gillies
Jade Scott
R. Maolalaigh
AI4CE
46
2
0
23 May 2022
Chaotic Regularization and Heavy-Tailed Limits for Deterministic
  Gradient Descent
Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent
Soon Hoe Lim
Yijun Wan
Umut cSimcsekli
86
12
0
23 May 2022
GBA: A Tuning-free Approach to Switch between Synchronous and
  Asynchronous Training for Recommendation Model
GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Model
Wenbo Su
Yuanxing Zhang
Yufeng Cai
Kaixu Ren
Pengjie Wang
...
Jing Chen
Hongbo Deng
Jian Xu
Lin Qu
Bo Zheng
64
5
0
23 May 2022
FedAdapter: Efficient Federated Learning for Modern NLP
FedAdapter: Efficient Federated Learning for Modern NLP
Dongqi Cai
Yaozong Wu
Shangguang Wang
F. Lin
Mengwei Xu
FedMLAI4CE
72
23
0
20 May 2022
Kernel Normalized Convolutional Networks
Kernel Normalized Convolutional Networks
Reza Nasirigerdeh
Reihaneh Torkzadehmahani
Daniel Rueckert
Georgios Kaissis
52
2
0
20 May 2022
Scalable algorithms for physics-informed neural and graph networks
Scalable algorithms for physics-informed neural and graph networks
K. Shukla
Mengjia Xu
N. Trask
George Karniadakis
PINNAI4CE
131
41
0
16 May 2022
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
Keitaro Sakamoto
Issei Sato
83
9
0
15 May 2022
Evaluating the Generalization Ability of Super-Resolution Networks
Evaluating the Generalization Ability of Super-Resolution Networks
Yihao Liu
Hengyuan Zhao
Jinjin Gu
Yu Qiao
Chao Dong
71
26
0
14 May 2022
Investigating Generalization by Controlling Normalized Margin
Investigating Generalization by Controlling Normalized Margin
Alexander R. Farhang
Jeremy Bernstein
Kushal Tirumala
Yang Liu
Yisong Yue
83
6
0
08 May 2022
Large Scale Transfer Learning for Differentially Private Image
  Classification
Large Scale Transfer Learning for Differentially Private Image Classification
Harsh Mehta
Abhradeep Thakurta
Alexey Kurakin
Ashok Cutkosky
85
41
0
06 May 2022
UnrealNAS: Can We Search Neural Architectures with Unreal Data?
UnrealNAS: Can We Search Neural Architectures with Unreal Data?
Zhen Dong
Kaichen Zhou
Ge Li
Qiang Zhou
Mingfei Guo
Guohao Li
Kurt Keutzer
Shanghang Zhang
50
0
0
04 May 2022
Meta-free few-shot learning via representation learning with weight
  averaging
Meta-free few-shot learning via representation learning with weight averaging
Kuilin Chen
Chi-Guhn Lee
56
5
0
26 Apr 2022
Hybridised Loss Functions for Improved Neural Network Generalisation
Hybridised Loss Functions for Improved Neural Network Generalisation
Matthew C. Dickson
Anna Sergeevna Bosman
K. Malan
23
16
0
26 Apr 2022
Theoretical Understanding of the Information Flow on Continual Learning
  Performance
Theoretical Understanding of the Information Flow on Continual Learning Performance
Joshua Andle
Salimeh Yasaei Sekeh
CLL
64
8
0
26 Apr 2022
Federated Geometric Monte Carlo Clustering to Counter Non-IID Datasets
Federated Geometric Monte Carlo Clustering to Counter Non-IID Datasets
Federico Lucchetti
Jérémie Decouchant
Maria Fernandes
L. Chen
Marcus Volp
FedML
50
0
0
23 Apr 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10
  minutes on 1 GPU
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
80
7
0
13 Apr 2022
FuNNscope: Visual microscope for interactively exploring the loss
  landscape of fully connected neural networks
FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks
Aleksandar Doknic
Torsten Moller
100
2
0
09 Apr 2022
Differentially Private Sampling from Rashomon Sets, and the Universality
  of Langevin Diffusion for Convex Optimization
Differentially Private Sampling from Rashomon Sets, and the Universality of Langevin Diffusion for Convex Optimization
Arun Ganesh
Abhradeep Thakurta
Jalaj Upadhyay
79
1
0
04 Apr 2022
The Group Loss++: A deeper look into group loss for deep metric learning
The Group Loss++: A deeper look into group loss for deep metric learning
Ismail Elezi
Jenny Seidenschwarz
Laurin Wagner
Sebastiano Vascon
Alessandro Torcinovich
Marcello Pelillo
Laura Leal-Taixe
71
12
0
04 Apr 2022
Learning to Accelerate by the Methods of Step-size Planning
Learning to Accelerate by the Methods of Step-size Planning
Hengshuai Yao
76
0
0
01 Apr 2022
It's All In the Teacher: Zero-Shot Quantization Brought Closer to the
  Teacher
It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher
Kanghyun Choi
Hye Yoon Lee
Deokki Hong
Joonsang Yu
Noseong Park
Youngsok Kim
Jinho Lee
MQ
104
33
0
31 Mar 2022
Exploiting Explainable Metrics for Augmented SGD
Exploiting Explainable Metrics for Augmented SGD
Mahdi S. Hosseini
Mathieu Tuli
Konstantinos N. Plataniotis
AAML
61
3
0
31 Mar 2022
Concept Evolution in Deep Learning Training: A Unified Interpretation
  Framework and Discoveries
Concept Evolution in Deep Learning Training: A Unified Interpretation Framework and Discoveries
Haekyu Park
Seongmin Lee
Benjamin Hoover
Austin P. Wright
Omar Shaikh
Rahul Duggal
Nilaksh Das
Kevin Wenliang Li
Judy Hoffman
Duen Horng Chau
60
2
0
30 Mar 2022
Acknowledging the Unknown for Multi-label Learning with Single Positive
  Labels
Acknowledging the Unknown for Multi-label Learning with Single Positive Labels
Donghao Zhou
Pengfei Chen
Qiong Wang
Guangyong Chen
Pheng-Ann Heng
62
31
0
30 Mar 2022
A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved
  Neural Network Calibration
A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration
R. Hebbalaguppe
Jatin Prakash
Neelabh Madan
Chetan Arora
UQCV
117
45
0
25 Mar 2022
A Comparative Survey of Deep Active Learning
A Comparative Survey of Deep Active Learning
Xueying Zhan
Qingzhong Wang
Kuan-Hao Huang
Haoyi Xiong
Dejing Dou
Antoni B. Chan
FedMLHAI
129
112
0
25 Mar 2022
Improving Generalization in Federated Learning by Seeking Flat Minima
Improving Generalization in Federated Learning by Seeking Flat Minima
Debora Caldarola
Barbara Caputo
Marco Ciccone
FedML
101
112
0
22 Mar 2022
The activity-weight duality in feed forward neural networks: The
  geometric determinants of generalization
The activity-weight duality in feed forward neural networks: The geometric determinants of generalization
Yu Feng
Yuhai Tu
MLT
112
16
0
21 Mar 2022
Small Batch Sizes Improve Training of Low-Resource Neural MT
Small Batch Sizes Improve Training of Low-Resource Neural MT
Àlex R. Atrio
Andrei Popescu-Belis
64
6
0
20 Mar 2022
PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs
PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs
Zehao Dong
Muhan Zhang
Fuhai Li
Yixin Chen
CMLGNN
111
19
0
19 Mar 2022
Incremental Few-Shot Learning via Implanting and Compressing
Incremental Few-Shot Learning via Implanting and Compressing
Yiting Li
H. Zhu
Xijia Feng
Zilong Cheng
Jun Ma
Cheng Xiang
P. Vadakkepat
T. Lee
CLLVLM
85
2
0
19 Mar 2022
On the Generalization Mystery in Deep Learning
On the Generalization Mystery in Deep Learning
S. Chatterjee
Piotr Zielinski
OOD
65
35
0
18 Mar 2022
Randomized Sharpness-Aware Training for Boosting Computational
  Efficiency in Deep Learning
Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
34
10
0
18 Mar 2022
Confidence Dimension for Deep Learning based on Hoeffding Inequality and
  Relative Evaluation
Confidence Dimension for Deep Learning based on Hoeffding Inequality and Relative Evaluation
Runqi Wang
Linlin Yang
Baochang Zhang
Wentao Zhu
David Doermann
Guodong Guo
35
1
0
17 Mar 2022
Towards understanding deep learning with the natural clustering prior
Towards understanding deep learning with the natural clustering prior
Simon Carbonnelle
52
0
0
15 Mar 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training
Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang
Boqing Gong
Liangzhe Yuan
Huayu Chen
Hartwig Adam
Nicha Dvornek
S. Tatikonda
James Duncan
Ting Liu
105
157
0
15 Mar 2022
Phenomenology of Double Descent in Finite-Width Neural Networks
Phenomenology of Double Descent in Finite-Width Neural Networks
Sidak Pal Singh
Aurelien Lucchi
Thomas Hofmann
Bernhard Schölkopf
67
9
0
14 Mar 2022
Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD
Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD
Bapi Chatterjee
Vyacheslav Kungurtsev
Dan Alistarh
FedML
54
2
0
13 Mar 2022
GRAND+: Scalable Graph Random Neural Networks
GRAND+: Scalable Graph Random Neural Networks
Wenzheng Feng
Yuxiao Dong
Tinglin Huang
Ziqi Yin
Xu Cheng
Evgeny Kharlamov
Jie Tang
GNN
68
43
0
12 Mar 2022
Enhancing Adversarial Training with Second-Order Statistics of Weights
Enhancing Adversarial Training with Second-Order Statistics of Weights
Gao Jin
Xinping Yi
Wei Huang
S. Schewe
Xiaowei Huang
AAML
89
48
0
11 Mar 2022
QDrop: Randomly Dropping Quantization for Extremely Low-bit
  Post-Training Quantization
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
Xiuying Wei
Ruihao Gong
Yuhang Li
Xianglong Liu
F. Yu
MQVLM
98
178
0
11 Mar 2022
Boosting Mask R-CNN Performance for Long, Thin Forensic Traces with
  Pre-Segmentation and IoU Region Merging
Boosting Mask R-CNN Performance for Long, Thin Forensic Traces with Pre-Segmentation and IoU Region Merging
Moritz Zink
M. Schiele
Pengcheng Fan
Stephan Gasterstädt
SSeg
25
0
0
08 Mar 2022
Flat minima generalize for low-rank matrix recovery
Flat minima generalize for low-rank matrix recovery
Lijun Ding
Dmitriy Drusvyatskiy
Maryam Fazel
Zaid Harchaoui
82
18
0
07 Mar 2022
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive
  Representation Learning
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
Weixin Liang
Yuhui Zhang
Yongchan Kwon
Serena Yeung
James Zou
VLM
147
430
0
03 Mar 2022
An Information-Theoretic Framework for Supervised Learning
An Information-Theoretic Framework for Supervised Learning
Hong Jun Jeon
Yifan Zhu
Benjamin Van Roy
92
7
0
01 Mar 2022
Adversarial robustness of sparse local Lipschitz predictors
Adversarial robustness of sparse local Lipschitz predictors
Ramchandran Muthukumar
Jeremias Sulam
AAML
92
13
0
26 Feb 2022
On PAC-Bayesian reconstruction guarantees for VAEs
On PAC-Bayesian reconstruction guarantees for VAEs
Badr-Eddine Chérief-Abdellatif
Yuyang Shi
Arnaud Doucet
Benjamin Guedj
DRL
107
19
0
23 Feb 2022
Privacy Leakage of Adversarial Training Models in Federated Learning
  Systems
Privacy Leakage of Adversarial Training Models in Federated Learning Systems
Jingyang Zhang
Yiran Chen
Hai Helen Li
FedMLPICV
134
16
0
21 Feb 2022
Survey on Large Scale Neural Network Training
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
Olivier Beaumont
74
11
0
21 Feb 2022
Learning Bayesian Sparse Networks with Full Experience Replay for
  Continual Learning
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning
Dong Gong
Qingsen Yan
Yuhang Liu
Anton Van Den Hengel
Javen Qinfeng Shi
CLLBDL
89
40
0
21 Feb 2022
Previous
123...131415...303132
Next