v1v2v3v4v5 (latest)

SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization

17 November 2024

ArXiv (abs)PDF HTML HuggingFace (56 upvotes)

Papers citing "SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization"

9 / 9 papers shown

Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation

153

18 Aug 2025

SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

208

30 May 2025

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache

275

05 Feb 2025

Open-Sora: Democratizing Efficient Video Production for All

754

482

31 Dec 2024

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 TrainingInternational Conference on Learning Representations (ICLR), 2024

519

25 Oct 2024

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

...

627

17 Oct 2024

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference AccelerationInternational Conference on Learning Representations (ICLR), 2024

Jun Zhu

Jianfei Chen

VLM MQ

702

03 Oct 2024

CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerInternational Conference on Learning Representations (ICLR), 2024

Zhuoyi Yang

Wendi Zheng

...

Xiaotao Gu

Yuxiao Dong

Jie Tang

DiffM VGen

919

1,347

12 Aug 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

606

161

07 May 2024