ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.03462
22
1

Linear Transformer Topological Masking with Graph Random Features

4 October 2024
Isaac Reid
Kumar Avinava Dubey
Deepali Jain
Will Whitney
Amr Ahmed
Joshua Ainslie
Alex Bewley
M. Jacob
Aranyak Mehta
David Rendleman
Connor Schenck
Richard E. Turner
René Wagner
Adrian Weller
Krzysztof Choromanski
ArXivPDFHTML
Abstract

When training transformers on graph-structured data, incorporating information about the underlying topology is crucial for good performance. Topological masking, a type of relative position encoding, achieves this by upweighting or downweighting attention depending on the relationship between the query and keys in a graph. In this paper, we propose to parameterise topological masks as a learnable function of a weighted adjacency matrix -- a novel, flexible approach which incorporates a strong structural inductive bias. By approximating this mask with graph random features (for which we prove the first known concentration bounds), we show how this can be made fully compatible with linear attention, preserving O(N)\mathcal{O}(N)O(N) time and space complexity with respect to the number of input tokens. The fastest previous alternative was O(Nlog⁡N)\mathcal{O}(N \log N)O(NlogN) and only suitable for specific graphs. Our efficient masking algorithms provide strong performance gains for tasks on image and point cloud data, including with >30>30>30k nodes.

View on arXiv
Comments on this paper