OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks

24 October 2022

Papers citing "OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks"

8 / 8 papers shown

Title
Efficient Memory Management for Large Language Model Serving with PagedAttention Woosuk Kwon Zhuohan Li Siyuan Zhuang Ying Sheng Lianmin Zheng Cody Hao Yu Joseph E. Gonzalez Haotong Zhang Ion Stoica VLM 29 1,771 0 12 Sep 2023
Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch Xunyi Zhao Théotime Le Hellard Lionel Eyraud Julia Gusak Olivier Beaumont 19 6 0 03 Jul 2023
Breaking On-device Training Memory Wall: A Systematic Survey Shitian Li Chunlin Tian Kahou Tam Ruirui Ma Li Li 21 2 0 17 Jun 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU Ying Sheng Lianmin Zheng Binhang Yuan Zhuohan Li Max Ryabinin ... Joseph E. Gonzalez Percy Liang Christopher Ré Ion Stoica Ce Zhang 144 366 0 13 Mar 2023
Overcoming Oscillations in Quantization-Aware Training Markus Nagel Marios Fournarakis Yelysei Bondarenko Tijmen Blankevoort MQ 106 98 0 21 Mar 2022
What is the State of Neural Network Pruning? Davis W. Blalock Jose Javier Gonzalez Ortiz Jonathan Frankle John Guttag 178 1,027 0 06 Mar 2020
Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices Byung Hoon Ahn Jinwon Lee J. Lin Hsin-Pai Cheng Jilei Hou H. Esmaeilzadeh 62 54 0 04 Mar 2020
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 948 20,549 0 17 Apr 2017