Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1604.06174
Cited By
Training Deep Nets with Sublinear Memory Cost
21 April 2016
Tianqi Chen
Bing Xu
Chiyuan Zhang
Carlos Guestrin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training Deep Nets with Sublinear Memory Cost"
50 / 200 papers shown
Title
xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data
Jing Gong
Minsheng Hao
Xingyi Cheng
Xin Zeng
Chiming Liu
Jianzhu Ma
Xuegong Zhang
Taifeng Wang
Leo T. Song
31
17
0
26 Nov 2023
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Jiahao Li
Hao Tan
Kai Zhang
Zexiang Xu
Fujun Luan
Yinghao Xu
Yicong Hong
Kalyan Sunkavalli
Greg Shakhnarovich
Sai Bi
50
254
0
10 Nov 2023
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
54
12
0
28 Oct 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
32
7
0
11 Oct 2023
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
35
76
0
09 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
29
15
0
28 Sep 2023
Cost-effective On-device Continual Learning over Memory Hierarchy with Miro
Xinyue Ma
Suyeon Jeong
Minjia Zhang
Di Wang
Jonghyun Choi
Myeongjae Jeon
CLL
16
13
0
11 Aug 2023
Towards General Text Embeddings with Multi-stage Contrastive Learning
Zehan Li
Xin Zhang
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
56
342
0
07 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
26
1
0
31 Jul 2023
Breaking On-device Training Memory Wall: A Systematic Survey
Shitian Li
Chunlin Tian
Kahou Tam
Ruirui Ma
Li Li
21
2
0
17 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
45
126
0
16 Jun 2023
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions
Dongshuo Yin
Xueting Han
Bin Li
Hao Feng
Jinghua Bai
VPVLM
26
17
0
16 Jun 2023
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
11
10
0
25 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
37
114
0
18 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Bin Cui
31
11
0
17 May 2023
TASTY: A Transformer based Approach to Space and Time complexity
K. Moudgalya
Ankit Ramakrishnan
Vamsikrishna Chemudupati
Xinghai Lu
14
3
0
06 May 2023
Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies
Oscar Li
James Harrison
Jascha Narain Sohl-Dickstein
Virginia Smith
Luke Metz
44
5
0
21 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
45
74
0
10 Apr 2023
Training Neural Networks for Execution on Approximate Hardware
Tianmu Li
Shurui Li
Puneet Gupta
27
1
0
08 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
40
0
07 Apr 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
71
785
0
30 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
52
465
0
27 Mar 2023
An Evaluation of Memory Optimization Methods for Training Neural Networks
Xiaoxuan Liu
Siddharth Jha
Alvin Cheung
23
0
0
26 Mar 2023
MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation
Saikat Roy
Gregor Koehler
Constantin Ulrich
Michael Baumgartner
Jens Petersen
Fabian Isensee
Paul F. Jaeger
Klaus Maier-Hein
ViT
MedIm
29
139
0
17 Mar 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Xiaonan Nie
Yi Liu
Fangcheng Fu
J. Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Bin Cui
MoE
24
16
0
06 Mar 2023
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking
Ziqi Pang
Jie Li
P. Tokmakov
Di Chen
Sergey Zagoruyko
Yu-xiong Wang
3DPC
30
47
0
07 Feb 2023
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu
Shenggui Li
Jiarui Fang
Yan Shao
Boyuan Yao
Yang You
OffRL
21
7
0
06 Feb 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
ExplainableFold: Understanding AlphaFold Prediction with Explainable AI
Juntao Tan
Yongfeng Zhang
20
6
0
27 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
16
25
0
24 Jan 2023
A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs
Fabian Falck
Christopher Williams
D. Danks
George Deligiannidis
C. Yau
Chris Holmes
Arnaud Doucet
M. Willetts
16
8
0
19 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
16
9
0
11 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
26
7
0
06 Jan 2023
CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion
Xingwei He
Yeyun Gong
Alex Jin
Hang Zhang
Anlei Dong
Jian Jiao
S. Yiu
Nan Duan
RALM
54
3
0
18 Dec 2022
On-device Training: A First Overview on Existing Systems
Shuai Zhu
Thiemo Voigt
Jeonggil Ko
Fatemeh Rahimian
34
14
0
01 Dec 2022
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
27
318
0
01 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
40
10
0
01 Dec 2022
Towards Practical Few-shot Federated NLP
Dongqi Cai
Yaozong Wu
Haitao Yuan
Shangguang Wang
F. Lin
Mengwei Xu
FedML
29
6
0
01 Dec 2022
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
25
7
0
28 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
35
24
0
25 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
64
674
0
14 Nov 2022
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
33
3
0
14 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
32
12
0
10 Nov 2022
Attention-based Neural Cellular Automata
Mattie Tesfaldet
Derek Nowrouzezahrai
C. Pal
ViT
29
17
0
02 Nov 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
29
8
0
24 Oct 2022
An efficient deep neural network to find small objects in large 3D images
Jungkyu Park
Jakub Chlkedowski
Stanislaw Jastrzebski
Jan Witowski
Yan Xu
...
Melanie Wegener
Linda Moy
Laura Heacock
B. Reig
Krzysztof J. Geras
MedIm
16
1
0
16 Oct 2022
An In-depth Study of Stochastic Backpropagation
J. Fang
Ming Xu
Hao Chen
Bing Shuai
Z. Tu
Joseph Tighe
BDL
32
1
0
30 Sep 2022
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU
Jian-He Liao
Mingzhen Li
Qingxiao Sun
Jiwei Hao
F. Yu
...
Ye Tao
Zicheng Zhang
Hailong Yang
Zhongzhi Luan
D. Qian
21
4
0
06 Sep 2022
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
Shishir G. Patil
Paras Jain
P. Dutta
Ion Stoica
Joseph E. Gonzalez
12
35
0
15 Jul 2022
Previous
1
2
3
4
Next