Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.03401
Cited By
Memory-Efficient Backpropagation Through Time
10 June 2016
A. Gruslys
Rémi Munos
Ivo Danihelka
Marc Lanctot
Alex Graves
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Memory-Efficient Backpropagation Through Time"
33 / 33 papers shown
Title
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
35
0
0
03 Mar 2025
GPU Memory Usage Optimization for Backward Propagation in Deep Network Training
Ding-Yong Hong
Tzu-Hsien Tsai
Ning Wang
Pangfeng Liu
Jan-Jan Wu
44
0
0
18 Feb 2025
Memory-Efficient Fine-Tuning of Transformers via Token Selection
Antoine Simoulin
Namyong Park
Xiaoyi Liu
Grey Yang
112
0
0
31 Jan 2025
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao
Masatoshi Uehara
Gabriele Scalia
Tommaso Biancalani
Sergey Levine
Ehsan Hajiramezanali
Ehsan Hajiramezanali
AI4CE
57
3
0
17 Jun 2024
Long-term Dependency for 3D Reconstruction of Freehand Ultrasound Without External Tracker
Qi Li
Ziyi Shen
Qian Li
D. Barratt
T. Dowrick
Matthew J. Clarkson
Tom Kamiel Magda Vercauteren
Yipeng Hu
27
4
0
16 Oct 2023
Brain-inspired learning in artificial neural networks: a review
Samuel Schmidgall
Jascha Achterberg
Thomas Miconi
Louis Kirsch
Rojin Ziaei
S. P. Hajiseyedrazi
Jason Eshraghian
31
52
0
18 May 2023
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Vitaliy Chiley
Vithursan Thangarasa
Abhay Gupta
Anshul Samar
Joel Hestness
D. DeCoste
50
8
0
28 Jun 2022
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Joya Chen
Kai Xu
Yuhui Wang
Yifei Cheng
Angela Yao
19
7
0
28 Feb 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan V. Oseledets
Olivier Beaumont
22
10
0
21 Feb 2022
Tutorial on amortized optimization
Brandon Amos
OffRL
75
43
0
01 Feb 2022
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
33
4
0
29 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
38
5
0
16 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
31
24
0
12 Aug 2021
Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory
Takashi Matsubara
Yuto Miyatake
Takaharu Yaguchi
20
23
0
19 Feb 2021
Enabling Binary Neural Network Training on the Edge
Erwei Wang
James J. Davis
Daniele Moro
Piotr Zielinski
Jia Jie Lim
C. Coelho
S. Chatterjee
P. Cheung
G. Constantinides
MQ
20
24
0
08 Feb 2021
Dynamic Tensor Rematerialization
Marisa Kirisame
Steven Lyubomirsky
Altan Haan
Jennifer Brennan
Mike He
Jared Roesch
Tianqi Chen
Zachary Tatlock
16
93
0
17 Jun 2020
Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory
Julien Herrmann
Olivier Beaumont
Lionel Eyraud-Dubois
J. Herrmann
Alexis Joly
Alena Shilova
BDL
21
29
0
27 Nov 2019
On-Device Machine Learning: An Algorithms and Learning Theory Perspective
Sauptik Dhar
Junyao Guo
Jiayi Liu
S. Tripathi
Unmesh Kurup
Mohak Shah
17
141
0
02 Nov 2019
Adaptively Truncating Backpropagation Through Time to Control Gradient Bias
Christopher Aicher
N. Foti
E. Fox
MQ
22
32
0
17 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
11
1,847
0
23 Apr 2019
ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs
A. Gholami
Kurt Keutzer
George Biros
25
166
0
27 Feb 2019
Training on the Edge: The why and the how
Navjot Kukreja
Alena Shilova
Olivier Beaumont
Jan Huckelheim
N. Ferrier
P. Hovland
Gerard Gorman
14
33
0
13 Feb 2019
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
11
9
0
21 Jan 2019
Supporting Very Large Models using Automatic Dataflow Graph Partitioning
Minjie Wang
Chien-chin Huang
Jinyang Li
35
154
0
24 Jul 2018
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
30
2
0
04 Jun 2018
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training
Bojian Zheng
Abhishek Tiwari
Nandita Vijaykumar
Gennady Pekhimenko
19
44
0
22 May 2018
Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery
T. Stepleton
Razvan Pascanu
Will Dabney
Siddhant M. Jayakumar
Hubert Soyer
Rémi Munos
11
4
0
13 May 2018
Dynamic Control Flow in Large-Scale Machine Learning
Yuan Yu
Martín Abadi
P. Barham
E. Brevdo
M. Burrows
...
Michael Isard
M. Kudlur
R. Monga
D. Murray
Xiaoqiang Zheng
AI4CE
19
106
0
04 May 2018
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Trieu H. Trinh
Andrew M. Dai
Thang Luong
Quoc V. Le
25
179
0
01 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
30
701
0
26 Feb 2018
Unbiasing Truncated Backpropagation Through Time
Corentin Tallec
Yann Ollivier
17
75
0
23 May 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
20
2,507
0
23 Jan 2017
Automatic differentiation in machine learning: a survey
A. G. Baydin
Barak A. Pearlmutter
Alexey Radul
J. Siskind
PINN
AI4CE
ODL
34
2,746
0
20 Feb 2015
1