Inference Optimization of Foundation Models on AI Accelerators

Inference Optimization of Foundation Models on AI Accelerators

12 July 2024

Kailash Budhathoki

Jonas M. Kübler

Matthäus Kleindessner

Papers citing "Inference Optimization of Foundation Models on AI Accelerators"

9 / 9 papers shown

Title
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata Siddhisanket Raskar B. Kale Farah Ferdaus Aditya Tanikanti Ken Raffenetti Valerie Taylor M. Emani V. Vishwanath 39 4 0 31 Oct 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks Albert Tseng Jerry Chee Qingyao Sun Volodymyr Kuleshov Christopher De Sa MQ 117 91 0 06 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 123 134 0 03 Feb 2024
The Falcon Series of Open Language Models Ebtesam Almazrouei Hamza Alobeidli Abdulaziz Alshamsi Alessandro Cappelli Ruxandra-Aimée Cojocaru ... Quentin Malartic Daniele Mazzotta Badreddine Noune B. Pannier Guilherme Penedo AI4TS ALM 107 389 0 28 Nov 2023
Dynamic Sparse Training with Structured Sparsity Mike Lasby A. Golubeva Utku Evci Mihai Nica Yani Andrew Ioannou 19 7 0 03 May 2023
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation Z. Yao Xiaoxia Wu Cheng-rong Li Stephen Youn Yuxiong He MQ 63 56 0 15 Mar 2023
Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks Charith Peris Lizhen Tan Thomas Gueudré Turan Gojayev Vivi Wei Gokmen Oz 17 4 0 10 Oct 2022
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 145 323 0 18 Feb 2022
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 249 1,982 0 28 Jul 2020