Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.05968
Cited By
v1
v2 (latest)
Space-time Mixing Attention for Video Transformer
Neural Information Processing Systems (NeurIPS), 2021
10 June 2021
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Space-time Mixing Attention for Video Transformer"
50 / 77 papers shown
Smooth regularization for efficient video recognition
Gil Goldman
Raja Giryes
Mahadev Satyanarayanan
AI4TS
305
0
0
25 Nov 2025
Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing
Miao Cao
Siming Zheng
Lishun Wang
Ziyang Chen
D. Brady
Xin Yuan
221
0
0
10 Sep 2025
SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion
Yuseon Kim
Kyongseok Park
404
1
0
10 Apr 2025
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
540
3
0
20 Nov 2024
FE-Adapter: Adapting Image-based Emotion Classifiers to Videos
IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2024
Shreyank N. Gowda
Boyan Gao
David A. Clifton
267
10
0
05 Aug 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
313
15
0
03 Jul 2024
Hybrid Feature Collaborative Reconstruction Network for Few-Shot Fine-Grained Image Classification
Shulei Qiu
Wanqi Yang
Ming Yang
288
5
0
02 Jul 2024
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
193
3
0
09 May 2024
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
369
30
0
05 Apr 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
344
38
0
26 Mar 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
519
19
0
29 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
478
12
0
18 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
516
3
0
15 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
300
0
0
10 Jan 2024
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
266
8
0
21 Dec 2023
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Min Yang
Huan Gao
Ping Guo
Limin Wang
ViT
367
22
0
04 Dec 2023
Learning Human Action Recognition Representations Without Real Humans
Neural Information Processing Systems (NeurIPS), 2023
Howard Zhong
Samarth Mishra
Donghyun Kim
SouYoung Jin
Yikang Shen
Hildegard Kuehne
Leonid Karlinsky
Venkatesh Saligrama
Aude Oliva
Rogerio Feris
369
9
0
10 Nov 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIP
VLM
298
30
0
08 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
European Conference on Computer Vision (ECCV), 2023
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
358
20
0
02 Oct 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
266
37
0
14 Sep 2023
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers
J. Denize
Mykola Liashuha
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
ViT
409
22
0
03 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
363
38
0
27 Aug 2023
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition
ACM Multimedia (ACM MM), 2023
Yujun Ma
Benjia Zhou
Ruili Wang
Pichao Wang
SLR
294
14
0
23 Aug 2023
Joint learning of images and videos with a single Vision Transformer
Shuki Shimizu
Toru Tamaki
ViT
211
0
0
21 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
243
18
0
10 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
IEEE International Conference on Computer Vision (ICCV), 2023
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
253
28
0
08 Aug 2023
Multimodal Distillation for Egocentric Action Recognition
IEEE International Conference on Computer Vision (ICCV), 2023
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
420
40
0
14 Jul 2023
Free-Form Composition Networks for Egocentric Action Recognition
Haoran Wang
Qinghua Cheng
Baosheng Yu
Yibing Zhan
Dapeng Tao
Liang Ding
Haibin Ling
EgoV
365
2
0
13 Jul 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Neurocomputing (Neurocomputing), 2023
Thanh-Dat Truong
Khoa Luu
EgoV
447
18
0
25 May 2023
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Fuyan Ma
Bin Sun
Shutao Li
ViT
195
34
0
05 May 2023
Efficient Video Action Detection with Token Dropout and Context Refinement
IEEE International Conference on Computer Vision (ICCV), 2023
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
361
31
0
17 Apr 2023
MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
Expert systems with applications (ESWA), 2023
Jian Sun
H. H. Dodge
Mohammad H. Mahoor
372
33
0
11 Apr 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
184
4
0
04 Apr 2023
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
Computer Vision and Pattern Recognition (CVPR), 2023
Giacomo Zara
Subhankar Roy
Paolo Rota
Elisa Ricci
VLM
313
26
0
03 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
366
1
0
01 Apr 2023
Streaming Video Model
Computer Vision and Pattern Recognition (CVPR), 2023
Yucheng Zhao
Chong Luo
Chuanxin Tang
DongDong Chen
Noel Codella
Zhengjun Zha
285
20
0
30 Mar 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
I. Dave
Mamshad Nayeem Rizve
Chong Chen
M. Shah
TTA
281
28
0
28 Mar 2023
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer
International Journal of Computer Vision (IJCV), 2023
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Yawen Cui
Jiehua Zhang
Juil Sock
Guoying Zhao
ViT
MedIm
280
127
0
07 Feb 2023
Optical Flow Estimation in 360
∘
^\circ
∘
Videos: Dataset, Model and Application
Bin Duan
Keshav Bhandari
Gaowen Liu
Yan Yan
211
0
0
27 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Computer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TS
CLIP
VLM
301
88
0
26 Jan 2023
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
IEEE International Conference on Computer Vision (ICCV), 2022
Sangwon Kim
Dasom Ahn
ByoungChul Ko
ViT
3DPC
390
47
0
12 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Xiyang Dai
Lu Yuan
Yu-Gang Jiang
VGen
435
127
0
08 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
International Journal of Computer Vision (IJCV), 2022
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
248
3
0
29 Nov 2022
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
International Conference on Machine Learning (ICML), 2022
Sun-Kyoo Hwang
Jaehong Yoon
Youngwan Lee
Sung Ju Hwang
463
16
0
19 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
268
172
0
17 Nov 2022
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen
Sen Xing
Zhe Chen
Yi Wang
Kunchang Li
...
Hongjie Zhang
Tong Lu
Yali Wang
Liming Wang
Yu Qiao
188
60
0
17 Nov 2022
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
Computer Vision and Pattern Recognition (CVPR), 2022
Lihao Liu
Jean Prost
Lei Zhu
Nicolas Papadakis
Pietro Lio
Carola-Bibiane Schönlieb
Angelica I Aviles-Rivero
307
44
0
13 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
225
0
0
11 Nov 2022
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
249
7
0
15 Oct 2022
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Farrukh Rahman
Ömer Mubarek
Z. Kira
ViT
311
3
0
15 Sep 2022
1
2
Next
Page 1 of 2