ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.03589
  4. Cited By
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

7 April 2023
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
    VLM
ArXivPDFHTML

Papers citing "On Efficient Training of Large-Scale Deep Learning Models: A Literature Review"

45 / 45 papers shown
Title
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
38
3
0
19 Feb 2025
Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics
Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics
Daniel Paulin
P. Whalley
Neil K. Chada
B. Leimkuhler
BDL
39
4
0
14 Oct 2024
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Johnny Jingze Li
V. George
Gabriel A. Silva
ODL
39
0
0
26 Jul 2024
Frontiers of Deep Learning: From Novel Application to Real-World
  Deployment
Frontiers of Deep Learning: From Novel Application to Real-World Deployment
Rui Xie
VLM
29
1
0
19 Jul 2024
Modular Growth of Hierarchical Networks: Efficient, General, and Robust
  Curriculum Learning
Modular Growth of Hierarchical Networks: Efficient, General, and Robust Curriculum Learning
Mani Hamidi
Sina Khajehabdollahi
E. Giannakakis
Tim Schäfer
Anna Levina
Charley M. Wu
19
0
0
10 Jun 2024
A General and Efficient Training for Transformer via Token Expansion
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang
Yunhang Shen
Jiao Xie
Baochang Zhang
Gaoqi He
Ke Li
Xing Sun
Shaohui Lin
38
2
0
31 Mar 2024
Healthcare Copilot: Eliciting the Power of General LLMs for Medical
  Consultation
Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation
Zhiyao Ren
Yibing Zhan
Baosheng Yu
Liang Ding
Dacheng Tao
LM&MA
32
12
0
20 Feb 2024
Testing the Segment Anything Model on radiology data
Testing the Segment Anything Model on radiology data
J. Almeida
N. M. Rodrigues
Sara Silva
Nickolas Papanikolaou
MedIm
VLM
28
1
0
20 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
25
21
0
01 Dec 2023
Applications of Large Scale Foundation Models for Autonomous Driving
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
46
14
0
20 Nov 2023
Explicit Foundation Model Optimization with Self-Attentive Feed-Forward
  Neural Units
Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units
Jake Ryland Williams
Haoran Zhao
13
0
0
13 Nov 2023
Reducing the Need for Backpropagation and Discovering Better Optima With
  Explicit Optimizations of Neural Networks
Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks
Jake Ryland Williams
Haoran Zhao
19
0
0
13 Nov 2023
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models
Miaoxi Zhu
Qihuang Zhong
Li Shen
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
MQ
VLM
29
1
0
20 Oct 2023
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
  Translation
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Junjie Yang
Liang Ding
Li Shen
Matthieu Labeau
Yibing Zhan
Weifeng Liu
Dacheng Tao
VLM
11
4
0
28 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
27
36
0
24 Aug 2023
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer
Guanyu Xu
Jiawei Hao
Li Shen
Han Hu
Yong Luo
Hui Lin
J. Shen
16
15
0
01 Aug 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting
Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting
Xinli Yu
Zheng Chen
Yuan Ling
Shujing Dong
Zongying Liu
Yanbin Lu
AIFin
AI4TS
110
67
0
19 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
31
15
0
05 Jun 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
27
9
0
24 May 2023
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
215
103
0
27 Oct 2022
ButterflyFlow: Building Invertible Layers with Butterfly Matrices
ButterflyFlow: Building Invertible Layers with Butterfly Matrices
Chenlin Meng
Linqi Zhou
Kristy Choi
Tri Dao
Stefano Ermon
TPM
127
10
0
28 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
69
26
0
14 Sep 2022
Efficient-Adam: Communication-Efficient Distributed Adam
Efficient-Adam: Communication-Efficient Distributed Adam
Congliang Chen
Li Shen
Wei Liu
Z. Luo
21
19
0
28 May 2022
Trainable Weight Averaging: Accelerating Training and Improving Generalization
Trainable Weight Averaging: Accelerating Training and Improving Generalization
Tao Li
Zhehao Huang
Yingwen Wu
Zhengbao He
Qinghua Tao
X. Huang
Chih-Jen Lin
MoMe
42
3
0
26 May 2022
Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning
Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning
Yuzhong Chen
Zhe Xiao
Lin Zhao
Lu Zhang
Haixing Dai
...
Tuo Zhang
Changying Li
Dajiang Zhu
Tianming Liu
Xi Jiang
36
18
0
20 May 2022
Transformer Quality in Linear Time
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
220
0
21 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
SCENIC: A JAX Library for Computer Vision Research and Beyond
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
41
67
0
18 Oct 2021
Improving Transformers with Probabilistic Attention Keys
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
37
31
0
16 Oct 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
73
77
0
12 Jul 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity
Jang-Hyun Kim
Wonho Choo
Hosan Jeong
Hyun Oh Song
195
173
0
05 Feb 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
157
399
0
18 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
213
87
0
31 Dec 2020
On the Transformer Growth for Progressive BERT Training
On the Transformer Growth for Progressive BERT Training
Xiaotao Gu
Liyuan Liu
Hongkun Yu
Jing Li
C. L. P. Chen
Jiawei Han
VLM
61
49
0
23 Oct 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
232
1,444
0
18 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
210
1,391
0
04 Dec 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,878
0
15 Sep 2016
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1