ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.11575
  4. Cited By
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
v1v2v3v4v5 (latest)

Shifted Chunk Transformer for Spatio-Temporal Representational Learning

Neural Information Processing Systems (NeurIPS), 2021
26 August 2021
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
    AI4TSViT
ArXiv (abs)PDFHTML

Papers citing "Shifted Chunk Transformer for Spatio-Temporal Representational Learning"

19 / 19 papers shown
Title
Robust DDoS-Attack Classification with 3D CNNs Against Adversarial Methods
Robust DDoS-Attack Classification with 3D CNNs Against Adversarial Methods
Landon Bragg
Nathan Dorsey
Josh Prior
John Ajit
Ben Kim
Nate Willis
Pablo Rivas
48
0
0
07 Sep 2025
From CNNs to Transformers in Multimodal Human Action Recognition: A
  Survey
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
273
25
0
22 May 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video
  Classification
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
215
7
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for
  Audio-Video Classification
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
130
5
0
08 Jan 2024
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
Wentao Zhu
278
2
0
03 Jan 2024
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized
  Self-Attention for Human Activity Recognition
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity RecognitionIEEE International Workshop on Multimedia Signal Processing (MMSP), 2023
Rachid Reda Dokkar
F. Chaieb
Hassen Drira
Arezki Aberkane
ViT
202
3
0
22 Oct 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Prune Spatio-temporal Tokens by Semantic-aware Temporal AccumulationIEEE International Conference on Computer Vision (ICCV), 2023
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
153
26
0
08 Aug 2023
AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary
  Detection
AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection
Wentao Zhu
Yufang Huang
Xi Xie
Wenxian Liu
Jincan Deng
Debing Zhang
Zinan Lin
Ji Liu
136
22
0
12 Apr 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
  Action Segmentation
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
106
3
0
04 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
150
6
0
01 Apr 2023
PatchBlender: A Motion Prior for Video Transformers
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
143
0
0
11 Nov 2022
Linear Video Transformer with Feature Fixation
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
164
6
0
15 Oct 2022
Video Mobile-Former: Video Recognition with Efficient Global
  Spatial-temporal Modeling
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Rui Wang
Zuxuan Wu
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Xiyang Dai
Luowei Zhou
Lu Yuan
Yu-Gang Jiang
ViT
198
6
0
25 Aug 2022
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
An Efficient Spatio-Temporal Pyramid Transformer for Action DetectionEuropean Conference on Computer Vision (ECCV), 2022
Yuetian Weng
Zizheng Pan
Mingfei Han
Xiaojun Chang
Bohan Zhuang
ViT
145
30
0
21 Jul 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
Fan Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViTMedIm
202
34
0
24 Mar 2022
Video Transformers: A Survey
Video Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
362
131
0
16 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation LearningInternational Conference on Learning Representations (ICLR), 2022
Kunchang Li
Yali Wang
Shiyang Feng
Guanglu Song
Yu Liu
Jiaming Song
Yu Qiao
ViT
359
317
0
12 Jan 2022
DualFormer: Local-Global Stratified Transformer for Efficient Video
  Recognition
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
160
24
0
09 Dec 2021
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
504
673
0
22 Dec 2020
1