The long-context capability of the Large Language Models (LLM) has made significant breakthroughs, but the maximum supported context length in length extrapolation remains a critical bottleneck limiting their practical applications. The constraint of context length in LLMs arises from the self-attention mechanism, which cannot effectively and efficiently capture the semantic relationships within infinitely long contexts via the limited pre-trained positional information and attention scope. In this work, we propose ReAttention, a training-free approach enabling LLM based on the self-attention mechanism to support an infinite context with a finite attention scope under sufficient memory resources. ReAttention performs the position-agnostic top- attention before the ordinary position-aware self-attention, freeing LLMs from the length extrapolation issue. We validate the performance of ReAttention on the LongBench, L-Eval, and InfiniteBench and demonstrate that it is on par with traditional methods. Furthermore, we also apply ReAttention on mainstream LLMs, including LLaMA3.1-8B and Mistral-v0.3-7B, enabling them to support context lengths of at least 1M and even expanding the context length of LLaMA3.2-3B-chat by 128 to 4M without any further training in Needle-In-A-Haystack tests. We also improve the efficiency of ReAttention with Triton and achieve an efficient extrapolation without additional overhead. The code is available atthis https URL.
View on arXiv@article{liu2025_2407.15176, title={ ReAttention: Training-Free Infinite Context with Finite Attention Scope }, author={ Xiaoran Liu and Ruixiao Li and Qipeng Guo and Zhigeng Liu and Yuerong Song and Kai Lv and Hang Yan and Linlin Li and Qun Liu and Xipeng Qiu }, journal={arXiv preprint arXiv:2407.15176}, year={ 2025 } }