FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

28 February 2025

Papers citing "FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference"

4 / 4 papers shown

Title
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs Piotr Nawrot Robert Li Renjie Huang Sebastian Ruder Kelly Marchisio E. Ponti 20 0 0 24 Apr 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention Yucheng Li Huiqiang Jiang Chengruidong Zhang Qianhui Wu Xufang Luo ... Amir H. Abdi Dongsheng Li Jianfeng Gao Y. Yang Lili Qiu 26 1 0 22 Apr 2025
XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu Guangxuan Xiao Haofeng Huang Junxian Guo Song Han 62 3 0 20 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue Ali Jalal-Kamali Nikolos Gurney David Pynadath AI4TS 102 0 0 05 Mar 2025