Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2112.01514
Cited By
v1
v2 (latest)
Self-supervised Video Transformer
2 December 2021
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Michael S. Ryoo
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-supervised Video Transformer"
50 / 61 papers shown
Title
GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models
Bin Wang
Ruotong Hu
Wenqian Wang
W. Li
Mingliang Gao
Runmin Cong
Wei Zhang
VLM
60
0
0
27 Nov 2025
Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion
Shanghong Li
Chiam Wen Qi Ruth
Hong Xu
Fang Liu
76
0
0
19 Sep 2025
FRAME: Pre-Training Video Feature Representations via Anticipation and Memory
Sethuraman TV
Savya Khosla
Vignesh Srinivasakumar
Jiahui Huang
Seoung Wug Oh
Simon Jenni
Derek Hoiem
Joon-Young Lee
154
1
0
05 Jun 2025
Heterogeneous Skeleton-Based Action Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Hongsong Wang
Xiaoyan Ma
Jidong Kuang
Jie Gui
223
5
0
04 Jun 2025
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
867
3
0
08 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
302
3
0
01 Apr 2025
A Framework for Double-Blind Federated Adaptation of Foundation Models
Nurbek Tastan
Karthik Nandakumar
FedML
243
0
0
03 Feb 2025
IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare
Subrata Kumer Paul
Abu Saleh Musa Miah
Rakhi Rani Paul
Md Ekramul Hamid
Jungpil Shin
M. Rahim
188
1
0
13 Jan 2025
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
233
14
0
22 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
515
52
0
28 Jun 2024
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta
Aditya Arora
Sanath Narayan
Salman Khan
Fahad Shahbaz Khan
Graham W. Taylor
192
7
0
21 Jun 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Yaoyao Liu
Cihang Xie
AI4TS
VGen
SSL
183
2
0
24 May 2024
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics
Jan Steckel
W. Jansen
Nico Huebel
MDE
143
0
0
21 May 2024
A Survey of Generative Techniques for Spatial-Temporal Data Mining
Qianru Zhang
Haixin Wang
Cheng Long
Liangcai Su
Xingwei He
...
Tailin Wu
Hongzhi Yin
Siu-Ming Yiu
Qi Tian
Christian S. Jensen
AI4TS
177
14
0
15 May 2024
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Andrei Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
324
16
0
19 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
327
2
0
15 Jan 2024
SVFAP: Self-supervised Video Facial Affect Perceiver
IEEE Transactions on Affective Computing (TAC), 2023
Guoying Zhao
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Yinan Han
Jianhua Tao
170
24
0
31 Dec 2023
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
I. Dave
Simon Jenni
Mubarak Shah
155
12
0
20 Dec 2023
REACT: Recognize Every Action Everywhere All At Once
Machine Vision and Applications (MVA), 2023
N. V. R. Chappa
Pha Nguyen
P. Dobbs
Khoa Luu
196
6
0
27 Nov 2023
Multi-entity Video Transformers for Fine-Grained Video Representation Learning
Matthew Walmer
Rose Kanjirathinkal
Kai Sheng Tai
Keyur Muzumdar
Taipeng Tian
Abhinav Shrivastava
ViT
331
0
0
17 Nov 2023
CycleCL: Self-supervised Learning for Periodic Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Matteo Destro
Michael Gygli
SSL
302
5
0
05 Nov 2023
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Srijan Das
Tanmay Jain
Dominick Reilly
P. Balaji
Soumyajit Karmakar
Shyam Marjit
Xiang Li
Abhijit Das
Michael S. Ryoo
248
21
0
31 Oct 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
244
9
0
02 Sep 2023
LAC: Latent Action Composition for Skeleton-based Action Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Di Yang
Yaohui Wang
A. Dantcheva
Quan Kong
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
435
18
0
28 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning
P. Balaji
Abhijit Das
Srijan Das
A. Dantcheva
CVBM
120
4
0
25 Aug 2023
Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations
IEEE International Conference on Computer Vision (ICCV), 2023
Mohammadreza Salehi
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
VOS
192
27
0
22 Aug 2023
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Neural Information Processing Systems (NeurIPS), 2023
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
390
15
0
20 Jul 2023
Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Zhao Wang
Chang Liu
Shaoting Zhang
Qi Dou
MedIm
385
95
0
29 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
229
3
0
09 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
268
1
0
02 Jun 2023
Modulate Your Spectrum in Self-Supervised Learning
International Conference on Learning Representations (ICLR), 2023
Xi Weng
Yu-Li Ni
Tengwei Song
Jie Luo
Rao Muhammad Anwer
Salman Khan
Fahad Shahbaz Khan
Lei Huang
174
8
0
26 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
239
13
0
23 May 2023
Self-Supervised Video Representation Learning via Latent Time Navigation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Di Yang
Yaohui Wang
Quan Kong
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
SSL
AI4TS
186
15
0
10 May 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Computer Vision and Pattern Recognition (CVPR), 2023
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
203
107
0
06 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
243
0
0
01 Apr 2023
3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
Lei Wang
Piotr Koniusz
ViT
188
65
0
25 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
IEEE International Conference on Computer Vision (ICCV), 2023
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
272
12
0
20 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
210
21
0
06 Mar 2023
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer
Scientific Reports (Sci Rep), 2023
N. H. Phong
B. Ribeiro
176
19
0
17 Feb 2023
Offline-to-Online Knowledge Distillation for Video Instance Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
H. Kim
Seunghun Lee
Sunghoon Im
OffRL
210
5
0
15 Feb 2023
Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Analysis
IEEE International Conference on Computer Vision (ICCV), 2023
Yankai Jiang
Ming Sun
Heng Guo
Xiaoyu Bai
K. Yan
Le Lu
Minfeng Xu
MedIm
213
32
0
11 Feb 2023
ResFormer: Scaling ViTs with Multi-Resolution Training
Computer Vision and Pattern Recognition (CVPR), 2022
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
195
51
0
01 Dec 2022
Spatio-Temporal Crop Aggregation for Video Representation Learning
IEEE International Conference on Computer Vision (ICCV), 2022
Sepehr Sameni
Simon Jenni
Paolo Favaro
251
4
0
30 Nov 2022
TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos
IEEE International Conference on Robotics and Automation (ICRA), 2022
Tushar Sangam
I. Dave
Waqas Sultani
M. Shah
ViT
AI4TS
185
34
0
16 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
British Machine Vision Conference (BMVC), 2022
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
155
62
0
13 Oct 2022
Masked Motion Encoding for Self-Supervised Video Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Xinyu Sun
Peihao Chen
Liang-Chieh Chen
Chan Li
Thomas H. Li
Zhuliang Yu
Chuang Gan
242
42
0
12 Oct 2022
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
Yuxin Song
Min Yang
Wenhao Wu
Dongliang He
Fu Li
Jingdong Wang
ViT
235
11
0
11 Oct 2022
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Computer Vision and Pattern Recognition (CVPR), 2022
Ziyun Zeng
Yuying Ge
Xihui Liu
Bin Chen
Ping Luo
Shutao Xia
Yixiao Ge
AI4TS
165
9
0
30 Sep 2022
ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos
IEEE Access (IEEE Access), 2022
James Wensel
Hayat Ullah
Arslan Munir
ViT
166
58
0
16 Aug 2022
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
Journal of Imaging (JI), 2022
Hayat Ullah
Arslan Munir
HAI
146
40
0
09 Aug 2022
1
2
Next