ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.02367
  4. Cited By
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

3 October 2024
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
    VLM
    MQ
ArXivPDFHTML

Papers citing "SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration"

11 / 11 papers shown
Title
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Lvmin Zhang
Maneesh Agrawala
DiffM
VGen
65
0
0
17 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
56
2
0
26 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
59
3
0
13 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue
Ali Jalal-Kamali
Nikolos Gurney
David Pynadath
AI4TS
102
0
0
05 Mar 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
C. L. P. Chen
Jintao Zhang
J. Zhu
Jianfei Chen
MQ
31
2
0
28 Feb 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
K. K.
Amir Gholami
MQ
31
1
0
05 Feb 2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
Twilight: Adaptive Attention Sparsity with Hierarchical Top-ppp Pruning
C. Lin
Jiaming Tang
Shuo Yang
Hanshuo Wang
Tian Tang
Boyu Tian
Ion Stoica
Song Han
Mingyu Gao
65
2
0
04 Feb 2025
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia wei
Jun-Jie Zhu
Jianfei Chen
VLM
MQ
47
15
0
17 Nov 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Y. Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
43
9
0
25 Oct 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
94
21
0
04 Jun 2024
Diffusion Bridge Implicit Models
Diffusion Bridge Implicit Models
Kaiwen Zheng
Guande He
Jianfei Chen
Fan Bao
Jun Zhu
DiffM
66
13
0
24 May 2024
1