v1v2 (latest)

Self-supervised Video Transformer

2 December 2021

Salman Khan

Papers citing "Self-supervised Video Transformer"

50 / 61 papers shown

GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models

131

27 Nov 2025

Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion

120

19 Sep 2025

FRAME: Pre-Training Video Feature Representations via Anticipation and Memory

Sethuraman TV

Savya Khosla

Vignesh Srinivasakumar

215

05 Jun 2025

Heterogeneous Skeleton-Based Action Representation LearningComputer Vision and Pattern Recognition (CVPR), 2025

256

04 Jun 2025

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning

925

08 Apr 2025

SMILE: Infusing Spatial and Motion Semantics in Masked Video LearningComputer Vision and Pattern Recognition (CVPR), 2025

346

01 Apr 2025

A Framework for Double-Blind Federated Adaptation of Foundation Models

Nurbek Tastan

Karthik Nandakumar

FedML

322

03 Feb 2025

IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

220

13 Jan 2025

SIGMA:Sinkhorn-Guided Masked Video Modeling

255

22 Jul 2024

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

...

594

28 Jun 2024

Open-Vocabulary Temporal Action Localization using Multimodal Guidance

Akshita Gupta

Aditya Arora

Sanath Narayan

Salman Khan

Fahad Shahbaz Khan

Graham W. Taylor

210

21 Jun 2024

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

Cihang Xie

219

24 May 2024

EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics

165

21 May 2024

A Survey of Generative Techniques for Spatial-Temporal Data Mining

...

220

15 May 2024

Understanding Video Transformers via Universal Concept Discovery

M. Kowal

Achal Dave

Rares Andrei Ambrus

Adrien Gaidon

Konstantinos G. Derpanis

P. Tokmakov

ViT

424

19 Jan 2024

Collaboratively Self-supervised Video Representation Learning for Action RecognitionIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024

381

15 Jan 2024

SVFAP: Self-supervised Video Facial Affect PerceiverIEEE Transactions on Affective Computing (TAC), 2023

190

31 Dec 2023

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

I. Dave

Simon Jenni

Mubarak Shah

184

20 Dec 2023

REACT: Recognize Every Action Everywhere All At OnceMachine Vision and Applications (MVA), 2023

213

27 Nov 2023

Multi-entity Video Transformers for Fine-Grained Video Representation Learning

418

17 Nov 2023

CycleCL: Self-supervised Learning for Periodic VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Matteo Destro

Michael Gygli

SSL

372

05 Nov 2023

Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked AutoencodersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

316

31 Oct 2023

Self-Supervised Video Transformers for Isolated Sign Language Recognition

Marcelo Sandoval-Castaneda

Yanhong Li

D. Brentari

Karen Livescu

Gregory Shakhnarovich

SLR

267

02 Sep 2023

LAC: Latent Action Composition for Skeleton-based Action SegmentationIEEE International Conference on Computer Vision (ICCV), 2023

549

28 Aug 2023

Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning

139

25 Aug 2023

Time Does Tell: Self-Supervised Time-Tuning of Dense Image RepresentationsIEEE International Conference on Computer Vision (ICCV), 2023

225

22 Aug 2023

Language-based Action Concept Spaces Improve Video Self-Supervised LearningNeural Information Processing Systems (NeurIPS), 2023

Kanchana Ranasinghe

Michael S. Ryoo

SSL VLM

430

20 Jul 2023

Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-trainInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023

441

102

29 Jun 2023

A Large-Scale Analysis on Self-Supervised Video Representation Learning

316

09 Jun 2023

Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work

Qiangchang Wang

Yilong Yin

300

02 Jun 2023

Modulate Your Spectrum in Self-Supervised LearningInternational Conference on Learning Representations (ICLR), 2023

Rao Muhammad Anwer

Salman Khan

Fahad Shahbaz Khan

Lei Huang

236

26 May 2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

Ying Shan

284

23 May 2023

Self-Supervised Video Representation Learning via Latent Time NavigationAAAI Conference on Artificial Intelligence (AAAI), 2023

223

10 May 2023

Vita-CLIP: Video and text adaptive CLIP via Multimodal PromptingComputer Vision and Pattern Recognition (CVPR), 2023

Salman Khan

232

110

06 Apr 2023

SVT: Supertoken Video Transformer for Efficient Video Understanding

Madian Khabsa

279

01 Apr 2023

3Mformer: Multi-order Multi-mode Transformer for Skeletal Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2023

Lei Wang

Piotr Koniusz

ViT

224

25 Mar 2023

Tubelet-Contrastive Self-Supervision for Video-Efficient GeneralizationIEEE International Conference on Computer Vision (ICCV), 2023

328

20 Mar 2023

SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition

299

06 Mar 2023

Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet TransformerScientific Reports (Sci Rep), 2023

N. H. Phong

B. Ribeiro

278

17 Feb 2023

Offline-to-Online Knowledge Distillation for Video Instance SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

266

15 Feb 2023

Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image AnalysisIEEE International Conference on Computer Vision (ICCV), 2023

297

11 Feb 2023

ResFormer: Scaling ViTs with Multi-Resolution TrainingComputer Vision and Pattern Recognition (CVPR), 2022

Zuxuan Wu

Yu Qiao

256

01 Dec 2022

Spatio-Temporal Crop Aggregation for Video Representation LearningIEEE International Conference on Computer Vision (ICCV), 2022

Sepehr Sameni

Simon Jenni

Paolo Favaro

315

30 Nov 2022

TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial VideosIEEE International Conference on Robotics and Automation (ICRA), 2022

242

16 Oct 2022

How to Train Vision Transformer on Small-scale Datasets?British Machine Vision Conference (BMVC), 2022

204

13 Oct 2022

Masked Motion Encoding for Self-Supervised Video Representation LearningComputer Vision and Pattern Recognition (CVPR), 2022

Chuang Gan

289

12 Oct 2022

It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training

Jingdong Wang

261

11 Oct 2022

Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeComputer Vision and Pattern Recognition (CVPR), 2022

Ping Luo

210

30 Sep 2022

ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in VideosIEEE Access (IEEE Access), 2022

223

16 Aug 2022

Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU FrameworkJournal of Imaging (JI), 2022

Hayat Ullah

Arslan Munir

HAI

161

09 Aug 2022