SqueezeAttention: 2D Management of KV-Cache in LLM Inference via
  Layer-wise Optimal Budget

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Papers citing "SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget"