ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.05968
  4. Cited By
Space-time Mixing Attention for Video Transformer

Space-time Mixing Attention for Video Transformer

10 June 2021
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
    ViT
ArXivPDFHTML

Papers citing "Space-time Mixing Attention for Video Transformer"

50 / 78 papers shown
Title
SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion
SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion
Yuseon Kim
Kyongseok Park
27
0
0
10 Apr 2025
Principles of Visual Tokens for Efficient Video Understanding
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
73
0
0
20 Nov 2024
FE-Adapter: Adapting Image-based Emotion Classifiers to Videos
FE-Adapter: Adapting Image-based Emotion Classifiers to Videos
Shreyank N. Gowda
Boyan Gao
David A. Clifton
19
5
0
05 Aug 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
19
4
0
03 Jul 2024
Hybrid Feature Collaborative Reconstruction Network for Few-Shot
  Fine-Grained Image Classification
Hybrid Feature Collaborative Reconstruction Network for Few-Shot Fine-Grained Image Classification
Shulei Qiu
Wanqi Yang
Ming Yang
19
0
0
02 Jul 2024
A Survey on Backbones for Deep Video Action Recognition
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
25
1
0
09 May 2024
Learning Correlation Structures for Vision Transformers
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
24
7
0
05 Apr 2024
OmniVid: A Generative Framework for Universal Video Understanding
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
63
14
0
26 Mar 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
14
5
0
18 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie M. Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
52
0
0
15 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
17
0
0
10 Jan 2024
Video Recognition in Portrait Mode
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
23
3
0
21 Dec 2023
Adapting Short-Term Transformers for Action Detection in Untrimmed
  Videos
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang
Huan Gao
Ping Guo
Limin Wang
ViT
26
5
0
04 Dec 2023
Learning Human Action Recognition Representations Without Real Humans
Learning Human Action Recognition Representations Without Real Humans
Howard Zhong
Samarth Mishra
Donghyun Kim
SouYoung Jin
Rameswar Panda
Hildegard Kuehne
Leonid Karlinsky
Venkatesh Saligrama
Aude Oliva
Rogerio Feris
24
3
0
10 Nov 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures,
  Optimization and Data
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIP
VLM
28
21
0
08 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to
  Video
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
27
8
0
02 Oct 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
16
18
0
14 Sep 2023
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action
  Spotting using Transformers
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers
J. Denize
Mykola Liashuha
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
ViT
6
13
0
03 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
14
20
0
27 Aug 2023
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action
  and Gesture Recognition
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
Yujun Ma
Benjia Zhou
Ruili Wang
Pichao Wang
SLR
11
9
0
23 Aug 2023
Joint learning of images and videos with a single Vision Transformer
Joint learning of images and videos with a single Vision Transformer
Shuki Shimizu
Toru Tamaki
ViT
11
0
0
21 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
22
9
0
10 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
16
16
0
08 Aug 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
13
22
0
14 Jul 2023
Free-Form Composition Networks for Egocentric Action Recognition
Free-Form Composition Networks for Egocentric Action Recognition
Haoran Wang
Qinghua Cheng
Baosheng Yu
Yibing Zhan
Dapeng Tao
Liang Ding
Haibin Ling
EgoV
33
0
0
13 Jul 2023
Cross-view Action Recognition Understanding From Exocentric to
  Egocentric Perspective
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
27
9
0
25 May 2023
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial
  Expression Recognition
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition
Fuyan Ma
Bin Sun
Shutao Li
ViT
14
20
0
05 May 2023
Efficient Video Action Detection with Token Dropout and Context
  Refinement
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
21
14
0
17 Apr 2023
MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive
  Impairment in older adults using facial videos
MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
Jian-jun Sun
H. H. Dodge
Mohammad H. Mahoor
12
13
0
11 Apr 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
  Action Segmentation
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
10
2
0
04 Apr 2023
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
Giacomo Zara
Subhankar Roy
Paolo Rota
Elisa Ricci
VLM
11
12
0
03 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
13
0
0
01 Apr 2023
Streaming Video Model
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
Dongdong Chen
Noel Codella
Zhengjun Zha
22
12
0
30 Mar 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video
  Representations for Semi-Supervised Action Recognition
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
I. Dave
Mamshad Nayeem Rizve
C. L. P. Chen
M. Shah
TTA
26
13
0
28 Mar 2023
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
  Temporal Difference Transformer
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Yawen Cui
Jiehua Zhang
Philip H. S. Torr
Guoying Zhao
ViT
MedIm
24
79
0
07 Feb 2023
Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and
  Application
Optical Flow Estimation in 360∘^\circ∘ Videos: Dataset, Model and Application
Bin Duan
Keshav Bhandari
Gaowen Liu
Yan Yan
11
0
0
27 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge
  Transferring
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TS
CLIP
VLM
19
45
0
26 Jan 2023
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Sangwon Kim
Dasom Ahn
ByoungChul Ko
ViT
3DPC
20
22
0
12 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for
  Self-supervised Video Representation Learning
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
16
86
0
08 Dec 2022
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant
  Spatiotemporal Tokens
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Sun-Kyoo Hwang
Jaehong Yoon
Youngwan Lee
S. Hwang
21
5
0
19 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
12
75
0
17 Nov 2022
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen
Sen Xing
Zhe Chen
Yi Wang
Kunchang Li
...
Hongjie Zhang
Tong Lu
Yali Wang
Liming Wang
Yu Qiao
30
46
0
17 Nov 2022
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
Lihao Liu
Jean Prost
Lei Zhu
Nicolas Papadakis
Pietro Lio'
Carola-Bibiane Schönlieb
Angelica I Aviles-Rivero
8
22
0
13 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
12
0
0
11 Nov 2022
Linear Video Transformer with Feature Fixation
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
22
4
0
15 Oct 2022
On the Surprising Effectiveness of Transformers in Low-Labeled Video
  Recognition
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Farrukh Rahman
Ömer Mubarek
Z. Kira
ViT
10
2
0
15 Sep 2022
Video Mobile-Former: Video Recognition with Efficient Global
  Spatial-temporal Modeling
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Rui Wang
Zuxuan Wu
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Luowei Zhou
Lu Yuan
Yu-Gang Jiang
ViT
25
4
0
25 Aug 2022
Efficient Attention-free Video Shift Transformers
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
14
1
0
23 Aug 2022
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for
  Action Recognition
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Wangmeng Xiang
C. Li
Biao Wang
Xihan Wei
Xiangpei Hua
Lei Zhang
ViT
15
26
0
27 Jul 2022
12
Next