ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.06205
  4. Cited By
Round and Round We Go! What makes Rotary Positional Encodings useful?

Round and Round We Go! What makes Rotary Positional Encodings useful?

8 October 2024
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
ArXivPDFHTML

Papers citing "Round and Round We Go! What makes Rotary Positional Encodings useful?"

10 / 10 papers shown
Title
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Yi Lu
Wanxu Zhao
Xin Zhou
Chenxin An
C. Wang
...
Jun Zhao
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
37
0
0
26 Apr 2025
Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation
Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation
Manvi Agarwal
Changhong Wang
Gaël Richard
22
0
0
07 Apr 2025
On the Spatial Structure of Mixture-of-Experts in Transformers
On the Spatial Structure of Mixture-of-Experts in Transformers
Daniel Bershatsky
Ivan V. Oseledets
MoE
30
0
0
06 Apr 2025
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Yuheng Wu
Wentao Guo
Zirui Liu
Heng Ji
Zhaozhuo Xu
Denghui Zhang
28
0
0
05 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
37
0
0
29 Mar 2025
How can representation dimension dominate structurally pruned LLMs?
Mingxue Xu
Lisa Alazraki
Danilo P. Mandic
48
0
0
06 Mar 2025
Rotary Outliers and Rotary Offset Features in Large Language Models
André Jonasson
64
0
0
03 Mar 2025
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
Jingang Qu
David Holzmüller
Gaël Varoquaux
Marine Le Morvan
LMTD
73
4
0
08 Feb 2025
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck
Isaac Reid
M. Jacob
Alex Bewley
Joshua Ainslie
...
Matthias Minderer
Dmitry Kalashnikov
Jonathan Tompson
Vikas Sindhwani
Krzysztof Choromanski
52
1
0
04 Feb 2025
softmax is not enough (for sharp out-of-distribution)
softmax is not enough (for sharp out-of-distribution)
Petar Veličković
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
18
1
0
01 Oct 2024
1