LieRE: Generalizing Rotary Position Encodings

Transformer architectures rely on position encodings to capture token dependencies. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which replaces RoPE's block-2D rotation matrix with a learned, dense, high-dimensional rotation matrix of variable sparsity. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 2\% relative improvement over state-of-the-art baselines on 2D tasks and 1.5\% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100, and we release our code to facilitate further research.
View on arXiv@article{ostmeier2025_2406.10322, title={ LieRE: Generalizing Rotary Position Encodings }, author={ Sophie Ostmeier and Brian Axelrod and Michael E. Moseley and Akshay Chaudhari and Curtis Langlotz }, journal={arXiv preprint arXiv:2406.10322}, year={ 2025 } }