Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10076
Cited By
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
15 February 2024
Taesu Kim
Jongho Lee
Daehyun Ahn
Sarang Kim
Jiwoong Choi
Minkyu Kim
Hyungjun Kim
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference"
1 / 1 papers shown
Title
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Saleh Ashkboos
Maximilian L. Croci
Marcelo Gennari do Nascimento
Torsten Hoefler
James Hensman
VLM
125
145
0
26 Jan 2024
1