Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.01107
Cited By
A Survey on Efficient Training of Transformers
2 February 2023
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey on Efficient Training of Transformers"
27 / 27 papers shown
Title
Knowledge Acquisition on Mass-shooting Events via LLMs for AI-Driven Justice
Benign John Ihugba
Afsana Nasrin
Ling Wu
Lin Li
Lijun Qian
Xishuang Dong
21
0
0
17 Apr 2025
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Dongsheng Li
31
0
0
21 Jan 2025
Does Self-Attention Need Separate Weights in Transformers?
Md. Kowsher
Nusrat Jahan Prottasha
Chun-Nam Yu
O. Garibay
Niloofar Yousefi
108
0
0
30 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language
Eta Yang
SLR
77
0
0
19 Nov 2024
Continuous Speech Tokenizer in Text To Speech
Yixing Li
Ruobing Xie
X. Sun
Yu Cheng
Zhanhui Kang
AuLLM
CLL
45
2
0
22 Oct 2024
Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models
Fei Liu
Zejun Kang
Xing Han
22
4
0
12 Aug 2024
Federating to Grow Transformers with Constrained Resources without Model Sharing
Shikun Shen
Yifei Zou
Yuan Yuan
Yanwei Zheng
Peng Li
Xiuzhen Cheng
Dongxiao Yu
33
0
0
19 Jun 2024
ToDo: Token Downsampling for Efficient Generation of High-Resolution Images
Ethan Smith
Nayan Saxena
Aninda Saha
DiffM
22
5
0
21 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
29
35
0
03 Feb 2024
A Survey on Structure-Preserving Graph Transformers
Van Thuy Hoang
O-Joun Lee
34
5
0
29 Jan 2024
Toward Responsible AI Use: Considerations for Sustainability Impact Assessment
Eva Thelisson
Grzegorz Mika
Quentin Schneiter
Kirtan Padh
Himanshu Verma
9
0
0
19 Dec 2023
Weight subcloning: direct initialization of transformers using larger pretrained ones
Mohammad Samragh
Mehrdad Farajtabar
Sachin Mehta
Raviteja Vemulapalli
Fartash Faghri
Devang Naik
Oncel Tuzel
Mohammad Rastegari
11
25
0
14 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
25
21
0
01 Dec 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
20
63
0
21 Sep 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
32
3
0
18 Aug 2023
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Cheng Han
Qifan Wang
Yiming Cui
Zhiwen Cao
Wenguan Wang
Siyuan Qi
Dongfang Liu
VPVLM
VLM
12
46
0
25 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
Fauno: The Italian Large Language Model that will leave you senza parole!
Andrea Bacciu
Giovanni Trappolini
Andrea Santilli
Emanuele Rodolà
Fabrizio Silvestri
12
18
0
26 Jun 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
22
39
0
07 Apr 2023
Towards Diverse Binary Segmentation via A Simple yet General Gated Network
Xiaoqi Zhao
Youwei Pang
Lihe Zhang
Huchuan Lu
Lei Zhang
23
14
0
18 Mar 2023
Knowledge Distillation in Vision Transformers: A Critical Review
Gousia Habib
Tausifa Jan Saleem
Brejesh Lall
11
15
0
04 Feb 2023
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Jiawei Du
Hanshu Yan
Jiashi Feng
Joey Tianyi Zhou
Liangli Zhen
Rick Siow Mong Goh
Vincent Y. F. Tan
AAML
102
132
0
07 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
278
3,784
0
18 Apr 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
399
0
18 Jan 2021
On the Transformer Growth for Progressive BERT Training
Xiaotao Gu
Liyuan Liu
Hongkun Yu
Jing Li
C. L. P. Chen
Jiawei Han
VLM
61
49
0
23 Oct 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1