QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

15 February 2024

Papers citing "QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference"

1 / 1 papers shown

Title
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Saleh Ashkboos Maximilian L. Croci Marcelo Gennari do Nascimento Torsten Hoefler James Hensman VLM 125 145 0 26 Jan 2024