Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.00962
Cited By
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"
20 / 170 papers shown
Title
Universal Sentence Representation Learning with Conditional Masked Language Model
Ziyi Yang
Yinfei Yang
Daniel Matthew Cer
Jax Law
Eric F. Darve
SSL
19
57
0
28 Dec 2020
Self supervised contrastive learning for digital histopathology
Ozan Ciga
Tony Xu
Anne L. Martel
SSL
103
305
0
27 Nov 2020
Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup
Cheng Yang
Shengnan Wang
Chao Yang
Yuechuan Li
Ru He
Jingqiao Zhang
24
25
0
27 Nov 2020
Morphological Disambiguation from Stemming Data
Antoine Nzeyimana
7
5
0
11 Nov 2020
Explainable COVID-19 Detection Using Chest CT Scans and Deep Learning
H. Alshazly
C. Linse
Erhardt Barth
T. Martinetz
6
159
0
09 Nov 2020
Exploring the limits of Concurrency in ML Training on Google TPUs
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
...
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDL
AIMat
MoE
LRM
9
27
0
07 Nov 2020
Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks
M. Fenton
Alexander Shmakov
Ta-Wei Ho
S. Hsu
D. Whiteson
Pierre Baldi
39
37
0
19 Oct 2020
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
Yingqi Qu
Yuchen Ding
Jing Liu
Kai Liu
Ruiyang Ren
Xin Zhao
Daxiang Dong
Hua-Hong Wu
Haifeng Wang
RALM
OffRL
209
593
0
16 Oct 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
8
8
0
28 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
19
37
0
09 Jul 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
Adrian Lañcucki
6
332
0
11 Jun 2020
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
19
2,832
0
09 Jun 2020
Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers
Tsung-Han Wu
Chun-Chen Hsieh
Yen-Hao Chen
Po-Han Chi
Hung-yi Lee
13
1
0
09 Jun 2020
DeepRx: Fully Convolutional Deep Learning Receiver
M. Honkala
D. Korpi
J. Huttunen
30
133
0
04 May 2020
Solving Raven's Progressive Matrices with Multi-Layer Relation Networks
Marius Jahrens
T. Martinetz
AIMat
GNN
9
29
0
25 Mar 2020
FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings
Niels Ole Salscheider
6
36
0
18 Feb 2020
On the distance between two neural networks and the stability of learning
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Ming-Yu Liu
ODL
190
57
0
09 Feb 2020
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
17
277
0
15 Dec 2019
On the Cross-lingual Transferability of Monolingual Representations
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
28
771
0
25 Oct 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,886
0
15 Sep 2016
Previous
1
2
3
4