Round and Round We Go! What makes Rotary Positional Encodings useful?

Round and Round We Go! What makes Rotary Positional Encodings useful?

8 October 2024

Federico Barbero

Alex Vitvitskyi

Christos Perivolaropoulos

Petar Velickovic

Papers citing "Round and Round We Go! What makes Rotary Positional Encodings useful?"

10 / 10 papers shown

Title
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation Yi Lu Wanxu Zhao Xin Zhou Chenxin An C. Wang ... Jun Zhao Tao Ji Tao Gui Qi Zhang Xuanjing Huang 37 0 0 26 Apr 2025
Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation Manvi Agarwal Changhong Wang Gaël Richard 22 0 0 07 Apr 2025
On the Spatial Structure of Mixture-of-Experts in Transformers Daniel Bershatsky Ivan V. Oseledets MoE 30 0 0 06 Apr 2025
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models Yuheng Wu Wentao Guo Zirui Liu Heng Ji Zhaozhuo Xu Denghui Zhang 28 0 0 05 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 37 0 0 29 Mar 2025
How can representation dimension dominate structurally pruned LLMs? Mingxue Xu Lisa Alazraki Danilo P. Mandic 48 0 0 06 Mar 2025
Rotary Outliers and Rotary Offset Features in Large Language Models André Jonasson 64 0 0 03 Mar 2025
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data Jingang Qu David Holzmüller Gaël Varoquaux Marine Le Morvan LMTD 73 4 0 08 Feb 2025
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING Connor Schenck Isaac Reid M. Jacob Alex Bewley Joshua Ainslie ... Matthias Minderer Dmitry Kalashnikov Jonathan Tompson Vikas Sindhwani Krzysztof Choromanski 52 1 0 04 Feb 2025
softmax is not enough (for sharp out-of-distribution) Petar Veličković Christos Perivolaropoulos Federico Barbero Razvan Pascanu 18 1 0 01 Oct 2024