280
v1v2v3 (latest)

ReAttention: Training-Free Infinite Context with Finite Attention Scope

Xiaoran Liu
Ruixiao Li
Hang Yan
Linlin Li
Qun Liu
Xipeng Qiu
Main:13 Pages
12 Figures
Bibliography:1 Pages
7 Tables
Appendix:7 Pages
Abstract

The long-context capability of the Large Language Models (LLM) has made significant breakthroughs, but the maximum supported context length in length extrapolation remains a critical bottleneck limiting their practical applications. The constraint of context length in LLMs arises from the self-attention mechanism, which cannot effectively and efficiently capture the semantic relationships within infinitely long contexts via the limited pre-trained positional information and attention scope. In this work, we propose ReAttention, a training-free approach enabling LLM based on the self-attention mechanism to support an infinite context with a finite attention scope under sufficient memory resources. ReAttention performs the position-agnostic top-kk attention before the ordinary position-aware self-attention, freeing LLMs from the length extrapolation issue. We validate the performance of ReAttention on the LongBench, L-Eval, and InfiniteBench and demonstrate that it is on par with traditional methods. Furthermore, we also apply ReAttention on mainstream LLMs, including LLaMA3.1-8B and Mistral-v0.3-7B, enabling them to support context lengths of at least 1M and even expanding the context length of LLaMA3.2-3B-chat by 128×\times to 4M without any further training in Needle-In-A-Haystack tests. We also improve the efficiency of ReAttention with Triton and achieve an efficient extrapolation without additional overhead. The code is available atthis https URL.

View on arXiv
Comments on this paper