TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

7 October 2024

Papers citing "TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention"

1 / 1 papers shown

Title
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Zhenmei Shi Yifei Ming Xuan-Phi Nguyen Yingyu Liang Shafiq Joty 76 27 0 25 Sep 2024