ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.07999
119
34
v1v2v3v4v5v6v7v8 (latest)

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

16 July 2021
K. Choromanski
Han Lin
Haoxian Chen
Tianyi Zhang
Arijit Sehanobish
Valerii Likhosherstov
Jack Parker-Holder
Tamás Sarlós
Adrian Weller
Thomas Weingarten
ArXiv (abs)PDFHTML
Abstract

In this paper we provide, to the best of our knowl-edge, the first comprehensive approach for in-corporating various masking mechanisms intoTransformers architectures in a scalable way. Weshow that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of thisgeneral mechanism. However by casting the prob-lem as a topological (graph-based) modulation ofunmasked attention, we obtain several results un-known before, including efficientd-dimensionalRPE-masking and graph-kernel masking. Weleverage many mathematical techniques rangingfrom spectral analysis through dynamic program-ming and random walks to new algorithms forsolving Markov processes on graphs. We providea corresponding empirical evaluation.

View on arXiv
Comments on this paper