Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.02842
Cited By
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
5 May 2024
Yuzhen Mao
Martin Ester
Ke Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs"
5 / 5 papers shown
Title
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles X. Ling
Boyu Wang
47
1
0
24 Jan 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
K. Zhang
C. L. P. Chen
Fan Yang
Y. Yang
Lili Qiu
44
29
0
03 Jan 2025
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
67
81
0
02 Jul 2024
SparQ Attention: Bandwidth-Efficient LLM Inference
Luka Ribar
Ivan Chelombiev
Luke Hudlass-Galley
Charlie Blake
Carlo Luschi
Douglas Orr
21
45
0
08 Dec 2023
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
98
41
0
25 Jul 2021
1