Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching

17 October 2024

Yanyong Zhang

Papers citing "Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching"

2 / 2 papers shown

Title
Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen J. Li Yixin Ji Z. Yang Tong Liu Qingrong Xia Xinyu Duan Z. Wang Baoxing Huai M. Zhang LLMAG 77 0 0 28 Apr 2025
Carbon Footprint Evaluation of Code Generation through LLM as a Service Tina Vartziotis Maximilian Schmidt George Dasoulas Ippolyti Dellatolas Stefano Attademo Viet Dung Le Anke Wiechmann Tim Hoffmann Michael Keckeisen S. Kotsopoulos 23 2 0 30 Mar 2025