ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.18040
93
8

Theoretical Constraints on the Expressive Power of RoPE\mathsf{RoPE}RoPE-based Tensor Attention Transformers

23 December 2024
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
Mingda Wan
ArXivPDFHTML
Abstract

Tensor Attention extends traditional attention mechanisms by capturing high-order correlations across multiple modalities, addressing the limitations of classical matrix-based attention. Meanwhile, Rotary Position Embedding (RoPE\mathsf{RoPE}RoPE) has shown superior performance in encoding positional information in long-context scenarios, significantly enhancing transformer models' expressiveness. Despite these empirical successes, the theoretical limitations of these technologies remain underexplored. In this study, we analyze the circuit complexity of Tensor Attention and RoPE\mathsf{RoPE}RoPE-based Tensor Attention, showing that with polynomial precision, constant-depth layers, and linear or sublinear hidden dimension, they cannot solve fixed membership problems or (AF,r)∗(A_{F,r})^*(AF,r​)∗ closure problems, under the assumption that TC0≠NC1\mathsf{TC}^0 \neq \mathsf{NC}^1TC0=NC1. These findings highlight a gap between the empirical performance and theoretical constraints of Tensor Attention and RoPE\mathsf{RoPE}RoPE-based Tensor Attention Transformers, offering insights that could guide the development of more theoretically grounded approaches to Transformer model design and scaling.

View on arXiv
Comments on this paper