ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.00962
  4. Cited By
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
    ODL
ArXivPDFHTML

Papers citing "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"

50 / 160 papers shown
Title
Pretraining Large Brain Language Model for Active BCI: Silent Speech
Pretraining Large Brain Language Model for Active BCI: Silent Speech
Jinzhao Zhou
Zehong Cao
Yiqun Duan
Connor Barkley
Daniel Leong
...
Ziyi Zhao
T. Do
Yu-Cheng Chang
Sheng-Fu Liang
Chin-Teng Lin
32
0
0
29 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
46
0
0
22 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware
Ching-Yi Lin
Sahil Shah
MQ
64
0
0
11 Apr 2025
Model Diffusion for Certifiable Few-shot Transfer Learning
Model Diffusion for Certifiable Few-shot Transfer Learning
Fady Rezk
Royson Lee
H. Gouk
Timothy M. Hospedales
Minyoung Kim
48
0
0
10 Feb 2025
Importance Sampling via Score-based Generative Models
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
127
0
0
07 Feb 2025
Learning Versatile Optimizers on a Compute Diet
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
108
0
0
22 Jan 2025
A Hessian-informed hyperparameter optimization for differential learning rate
A Hessian-informed hyperparameter optimization for differential learning rate
Shiyun Xu
Zhiqi Bu
Yiliang Zhang
Ian J. Barnett
39
1
0
12 Jan 2025
AdaPRL: Adaptive Pairwise Regression Learning with Uncertainty Estimation for Universal Regression Tasks
AdaPRL: Adaptive Pairwise Regression Learning with Uncertainty Estimation for Universal Regression Tasks
Fuhang Liang
Rucong Xu
Deng Lin
OOD
33
0
0
10 Jan 2025
Mapping the Edge of Chaos: Fractal-Like Boundaries in The Trainability of Decoder-Only Transformer Models
Mapping the Edge of Chaos: Fractal-Like Boundaries in The Trainability of Decoder-Only Transformer Models
Bahman Torkamandi
AI4CE
35
0
0
08 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
112
0
0
30 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Y. Wang
Kuan-Chieh Jackson Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
X. Wang
VGen
86
1
0
12 Dec 2024
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu
Tengbo Yu
Haoyuan Deng
Season Si Chen
Yansong Tang
Ziwei Wang
75
3
0
09 Dec 2024
Autoregressive Action Sequence Learning for Robotic Manipulation
Autoregressive Action Sequence Learning for Robotic Manipulation
Xinyu Zhang
Yuhan Liu
Haonan Chang
Liam Schramm
Abdeslam Boularias
26
8
0
04 Oct 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGen
DiffM
80
42
0
17 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
38
56
0
10 Jul 2024
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour
Victor Besnier
Vicky Kalogeiton
David Picard
DiffM
49
2
0
30 May 2024
Multi-Modal Generative Embedding Model
Multi-Modal Generative Embedding Model
Feipeng Ma
Hongwei Xue
Guangting Wang
Yizhou Zhou
Fengyun Rao
Shilin Yan
Yueyi Zhang
Siying Wu
Mike Zheng Shou
Xiaoyan Sun
VLM
26
3
0
29 May 2024
Interpretable Robotic Manipulation from Language
Interpretable Robotic Manipulation from Language
Boyuan Zheng
Jianlong Zhou
Fang Chen
LM&Ro
32
0
0
27 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
74
2
0
26 May 2024
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao
Sizhe Dang
Haishan Ye
Guang Dai
Yi Qian
Ivor W.Tsang
66
8
0
23 Feb 2024
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
  Synthesis
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace
Aliaksandr Siarohin
Ivan Skorokhodov
Ekaterina Deyneka
Tsai-Shien Chen
...
Yuwei Fang
A. Stoliar
Elisa Ricci
Jian Ren
Sergey Tulyakov
VGen
38
56
0
22 Feb 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
19
2
0
17 Jan 2024
Black-Box Tuning of Vision-Language Models with Effective Gradient
  Approximation
Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation
Zixian Guo
Yuxiang Wei
Ming-Yu Liu
Zhilong Ji
Jinfeng Bai
Yiwen Guo
Wangmeng Zuo
VLM
27
8
0
26 Dec 2023
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
Alexander Nikulin
Vladislav Kurenkov
Ilya Zisman
Artem Agarkov
Viacheslav Sinii
Sergey Kolesnikov
24
23
0
19 Dec 2023
Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings
  concatenation?
Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings concatenation?
Ibrahim Bouabdallaoui
Fatima Guerouate
Samya Bouhaddour
C. Saadi
Mohammed Sbihi
14
0
0
12 Dec 2023
Analyzing and Improving the Training Dynamics of Diffusion Models
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
28
153
0
05 Dec 2023
Industrial Internet of Things Intelligence Empowering Smart
  Manufacturing: A Literature Review
Industrial Internet of Things Intelligence Empowering Smart Manufacturing: A Literature Review
Member Ieee Yujiao Hu
Qingmin Jia
Yuao Yao
Yong Lee
Mengjie Lee
Chenyi Wang
Xiaomao Zhou
Senior Member Ieee Renchao Xie
F. I. F. Richard Yu
11
29
0
02 Dec 2023
RETSim: Resilient and Efficient Text Similarity
RETSim: Resilient and Efficient Text Similarity
Marina Zhang
Owen Vallis
Aysegul Bumin
Tanay Vakharia
Elie Bursztein
23
1
0
28 Nov 2023
Large-scale Pretraining Improves Sample Efficiency of Active Learning
  based Molecule Virtual Screening
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening
Zhonglin Cao
Simone Sciabola
Ye Wang
30
1
0
20 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
51
19
0
23 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive
  Language-Image Pre-training
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
29
3
0
22 Aug 2023
CoNe: Contrast Your Neighbours for Supervised Image Classification
CoNe: Contrast Your Neighbours for Supervised Image Classification
Mingkai Zheng
Shan You
Lang Huang
Xiu Su
Fei Wang
Chao Qian
Xiaogang Wang
Chang Xu
VLM
20
0
0
21 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
37
3
0
18 Aug 2023
Image Captions are Natural Prompts for Text-to-Image Models
Image Captions are Natural Prompts for Text-to-Image Models
Shiye Lei
Hao Chen
Senyang Zhang
Bo-Lu Zhao
Dacheng Tao
VLM
24
19
0
17 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
25
8
0
26 Jun 2023
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
23
2
0
18 Jun 2023
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
14
0
0
25 May 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
27
9
0
24 May 2023
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Denis Tarasov
Vladislav Kurenkov
Alexander Nikulin
Sergey Kolesnikov
OffRL
25
36
0
16 May 2023
What is the best recipe for character-level encoder-only modelling?
What is the best recipe for character-level encoder-only modelling?
Kris Cao
32
2
0
09 May 2023
Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Guanyi Qin
R. Hu
Yutao Liu
Xiawu Zheng
Haotian Liu
Xiu Li
Yan Zhang
ViT
21
59
0
11 Apr 2023
An autoencoder compression approach for accelerating large-scale inverse
  problems
An autoencoder compression approach for accelerating large-scale inverse problems
J. Wittmer
Jacob Badger
H. Sundar
T. Bui-Thanh
AI4CE
29
1
0
10 Apr 2023
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
Kfir Y. Levy
Kfir Y. Levy
FedML
40
2
0
09 Apr 2023
Tag that issue: Applying API-domain labels in issue tracking systems
Tag that issue: Applying API-domain labels in issue tracking systems
Fabio Santos
Joseph Vargovich
Bianca Trinkenreich
Í. Santos
Jacob Penney
...
João Felipe Pimentel
I. Wiese
Igor Steinmacher
A. Sarma
M. Gerosa
8
4
0
06 Apr 2023
The Stable Signature: Rooting Watermarks in Latent Diffusion Models
The Stable Signature: Rooting Watermarks in Latent Diffusion Models
Pierre Fernandez
Guillaume Couairon
Hervé Jégou
Matthijs Douze
Teddy Furon
WIGM
15
174
0
27 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
52
463
0
27 Mar 2023
Mathematical Challenges in Deep Learning
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
35
1
0
24 Mar 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
13
7
0
19 Feb 2023
Improving Training Stability for Multitask Ranking Models in Recommender
  Systems
Improving Training Stability for Multitask Ranking Models in Recommender Systems
Jiaxi Tang
Yoel Drori
Daryl Chang
M. Sathiamoorthy
Justin Gilmer
Li Wei
Xinyang Yi
Lichan Hong
Ed H. Chi
17
10
0
17 Feb 2023
1234
Next