LongHeads: Multi-Head Attention is Secretly a Long Context Processor

16 February 2024

Xuanjing Huang

Papers citing "LongHeads: Multi-Head Attention is Secretly a Long Context Processor"

3 / 3 papers shown

Title
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation Yi Lu Wanxu Zhao Xin Zhou Chenxin An C. Wang ... Jun Zhao Tao Ji Tao Gui Qi Zhang Xuanjing Huang 39 0 0 26 Apr 2025
Cognitive Memory in Large Language Models Lianlei Shan Shixian Luo Zezhou Zhu Yu Yuan Yong Wu LLMAG KELM 69 1 0 03 Apr 2025
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 237 690 0 27 Aug 2021