Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.12924
Cited By
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
24 October 2022
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks"
8 / 8 papers shown
Title
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
29
1,771
0
12 Sep 2023
Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch
Xunyi Zhao
Théotime Le Hellard
Lionel Eyraud
Julia Gusak
Olivier Beaumont
19
6
0
03 Jul 2023
Breaking On-device Training Memory Wall: A Systematic Survey
Shitian Li
Chunlin Tian
Kahou Tam
Ruirui Ma
Li Li
21
2
0
17 Jun 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
366
0
13 Mar 2023
Overcoming Oscillations in Quantization-Aware Training
Markus Nagel
Marios Fournarakis
Yelysei Bondarenko
Tijmen Blankevoort
MQ
106
98
0
21 Mar 2022
What is the State of Neural Network Pruning?
Davis W. Blalock
Jose Javier Gonzalez Ortiz
Jonathan Frankle
John Guttag
178
1,027
0
06 Mar 2020
Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices
Byung Hoon Ahn
Jinwon Lee
J. Lin
Hsin-Pai Cheng
Jilei Hou
H. Esmaeilzadeh
62
54
0
04 Mar 2020
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
948
20,549
0
17 Apr 2017
1