ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.05526
10
10

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

12 July 2022
H. M. Zhang
Lechao Cheng
Y. Hao
Chong-Wah Ngo
    ViT
ArXivPDFHTML
Abstract

Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes TTT times longer sequence than the latter under the current attention of quadratic complexity (T2N2)(T^2N^2)(T2N2). The existing works treat the temporal axis as a simple extension of spatial axes, focusing on shortening the spatio-temporal sequence by either generic pooling or local windowing without utilizing temporal redundancy. However, videos naturally contain redundant information between neighboring frames; thereby, we could potentially suppress attention on visually similar frames in a dilated manner. Based on this hypothesis, we propose the LAPS, a long-term ``\textbf{\textit{Leap Attention}}'' (LA), short-term ``\textbf{\textit{Periodic Shift}}'' (\textit{P}-Shift) module for video transformers, with (2TN2)(2TN^2)(2TN2) complexity. Specifically, the ``LA'' groups long-term frames into pairs, then refactors each discrete pair via attention. The ``\textit{P}-Shift'' exchanges features between temporal neighbors to confront the loss of short-term dynamics. By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead (∼\sim∼2.6\%). Experiments on the standard Kinetics-400 benchmark demonstrate that our LAPS transformer could achieve competitive performances in terms of accuracy, FLOPs, and Params among CNN and transformer SOTAs. We open-source our project in \sloppy \href{https://github.com/VideoNetworks/LAPS-transformer}{\textit{\color{magenta}{https://github.com/VideoNetworks/LAPS-transformer}}} .

View on arXiv
Comments on this paper