ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.01514
  4. Cited By
Self-supervised Video Transformer

Self-supervised Video Transformer

2 December 2021
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
F. Khan
Michael S. Ryoo
    ViT
ArXivPDFHTML

Papers citing "Self-supervised Video Transformer"

50 / 63 papers shown
Title
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Y. S. Rawat
SSL
105
0
0
08 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
53
0
0
01 Apr 2025
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
47
3
0
22 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
59
23
0
28 Jun 2024
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta
Aditya Arora
Sanath Narayan
Salman Khan
F. Khan
Graham W. Taylor
21
3
0
21 Jun 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Alan L. Yuille
Cihang Xie
AI4TS
VGen
SSL
44
1
0
24 May 2024
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air
  Sonar Images for Mobile Robotics
EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics
Jan Steckel
W. Jansen
Nico Huebel
MDE
25
0
0
21 May 2024
A Survey of Generative Techniques for Spatial-Temporal Data Mining
A Survey of Generative Techniques for Spatial-Temporal Data Mining
Qianru Zhang
Haixin Wang
Cheng Long
Liangcai Su
Xingwei He
...
Tailin Wu
Hongzhi Yin
S. Yiu
Qi Tian
Christian S. Jensen
AI4TS
49
7
0
15 May 2024
Understanding Video Transformers via Universal Concept Discovery
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
27
8
0
19 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie M. Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
52
0
0
15 Jan 2024
SVFAP: Self-supervised Video Facial Affect Perceiver
SVFAP: Self-supervised Video Facial Affect Perceiver
Licai Sun
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Bin Liu
Jianhua Tao
38
14
0
31 Dec 2023
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
I. Dave
Simon Jenni
Mubarak Shah
25
7
0
20 Dec 2023
REACT: Recognize Every Action Everywhere All At Once
REACT: Recognize Every Action Everywhere All At Once
N. V. R. Chappa
Pha Nguyen
P. Dobbs
Khoa Luu
27
6
0
27 Nov 2023
Multi-entity Video Transformers for Fine-Grained Video Representation
  Learning
Multi-entity Video Transformers for Fine-Grained Video Representation Learning
Matthew Walmer
Rose Kanjirathinkal
Kai Sheng Tai
Keyur Muzumdar
Taipeng Tian
Abhinav Shrivastava
ViT
11
0
0
17 Nov 2023
CycleCL: Self-supervised Learning for Periodic Videos
CycleCL: Self-supervised Learning for Periodic Videos
Matteo Destro
Michael Gygli
SSL
19
1
0
05 Nov 2023
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked
  Autoencoders
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
Srijan Das
Tanmay Jain
Dominick Reilly
P. Balaji
Soumyajit Karmakar
Shyam Marjit
Xiang Li
Abhijit Das
Michael S. Ryoo
22
16
0
31 Oct 2023
Self-Supervised Video Transformers for Isolated Sign Language
  Recognition
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
8
1
0
02 Sep 2023
LAC: Latent Action Composition for Skeleton-based Action Segmentation
LAC: Latent Action Composition for Skeleton-based Action Segmentation
Di Yang
Yaohui Wang
A. Dantcheva
Quan Kong
Lorenzo Garattoni
Gianpiero Francesca
F. Brémond
27
9
0
28 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring
  Multi-task Learning
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning
P. Balaji
Abhijit Das
Srijan Das
A. Dantcheva
CVBM
9
4
0
25 Aug 2023
Time Does Tell: Self-Supervised Time-Tuning of Dense Image
  Representations
Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations
Mohammadreza Salehi
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
VOS
30
19
0
22 Aug 2023
Language-based Action Concept Spaces Improve Video Self-Supervised
  Learning
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
18
12
0
20 Jul 2023
Foundation Model for Endoscopy Video Analysis via Large-scale
  Self-supervised Pre-train
Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Zhao Wang
Chang Liu
Shaoting Zhang
Qi Dou
MedIm
13
54
0
29 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Y. S. Rawat
SSL
14
3
0
09 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and
  Outlook of Recent Work
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
18
0
0
02 Jun 2023
Modulate Your Spectrum in Self-Supervised Learning
Modulate Your Spectrum in Self-Supervised Learning
Xi Weng
Yu-Li Ni
Tengwei Song
Jie Luo
Rao Muhammad Anwer
Salman Khan
F. Khan
Lei Huang
32
5
0
26 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
  Scale
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
21
9
0
23 May 2023
Self-Supervised Video Representation Learning via Latent Time Navigation
Self-Supervised Video Representation Learning via Latent Time Navigation
Di Yang
Yaohui Wang
Quan Kong
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
F. Brémond
SSL
AI4TS
33
10
0
10 May 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
F. Khan
M. Shah
VLM
VPVLM
22
73
0
06 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
13
0
0
01 Apr 2023
3Mformer: Multi-order Multi-mode Transformer for Skeletal Action
  Recognition
3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
Lei Wang
Piotr Koniusz
ViT
21
44
0
25 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
30
9
0
20 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group
  Activity Recognition
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
40
15
0
06 Mar 2023
Video Action Recognition Collaborative Learning with Dynamics via
  PSO-ConvNet Transformer
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer
N. H. Phong
B. Ribeiro
19
15
0
17 Feb 2023
Offline-to-Online Knowledge Distillation for Video Instance Segmentation
Offline-to-Online Knowledge Distillation for Video Instance Segmentation
H. Kim
Seunghun Lee
Sunghoon Im
OffRL
36
3
0
15 Feb 2023
Anatomical Invariance Modeling and Semantic Alignment for
  Self-supervised Learning in 3D Medical Image Analysis
Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Analysis
Yankai Jiang
Ming Sun
Heng Guo
Xiaoyu Bai
K. Yan
Le Lu
Minfeng Xu
MedIm
19
19
0
11 Feb 2023
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
11
30
0
01 Dec 2022
Spatio-Temporal Crop Aggregation for Video Representation Learning
Spatio-Temporal Crop Aggregation for Video Representation Learning
Sepehr Sameni
Simon Jenni
Paolo Favaro
13
3
0
30 Nov 2022
TransVisDrone: Spatio-Temporal Transformer for Vision-based
  Drone-to-Drone Detection in Aerial Videos
TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos
Tushar Sangam
I. Dave
Waqas Sultani
M. Shah
ViT
AI4TS
14
27
0
16 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
How to Train Vision Transformer on Small-scale Datasets?
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
12
49
0
13 Oct 2022
Masked Motion Encoding for Self-Supervised Video Representation Learning
Masked Motion Encoding for Self-Supervised Video Representation Learning
Xinyu Sun
Peihao Chen
Liang-Chieh Chen
Chan Li
Thomas H. Li
Mingkui Tan
Chuang Gan
27
28
0
12 Oct 2022
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised
  Video Transformer Pre-training
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
Yuxin Song
Min Yang
Wenhao Wu
Dongliang He
Fu Li
Jingdong Wang
ViT
92
8
0
11 Oct 2022
Learning Transferable Spatiotemporal Representations from Natural Script
  Knowledge
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Ziyun Zeng
Yuying Ge
Xihui Liu
Bin Chen
Ping Luo
Shutao Xia
Yixiao Ge
AI4TS
29
8
0
30 Sep 2022
ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human
  Activity Recognition in Videos
ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos
James Wensel
Hayat Ullah
Arslan Munir
ViT
14
41
0
16 Aug 2022
Human Activity Recognition Using Cascaded Dual Attention CNN and
  Bi-Directional GRU Framework
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
Hayat Ullah
Arslan Munir
HAI
11
27
0
09 Aug 2022
Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens
  in 3D Space
Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Jinghuan Shang
Srijan Das
Michael S. Ryoo
36
13
0
23 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
22
130
0
18 Jun 2022
Does Self-supervised Learning Really Improve Reinforcement Learning from
  Pixels?
Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Xiang Li
Jinghuan Shang
Srijan Das
Michael S. Ryoo
SSL
17
31
0
10 Jun 2022
SPAct: Self-supervised Privacy Preservation for Action Recognition
SPAct: Self-supervised Privacy Preservation for Action Recognition
I. Dave
C. L. P. Chen
M. Shah
PICV
11
55
0
29 Mar 2022
Self-supervised Video-centralised Transformer for Video Face Clustering
Self-supervised Video-centralised Transformer for Video Face Clustering
Yujiang Wang
Mingzhi Dong
Jie Shen
Yi-Si Luo
Yiming Lin
Pingchuan Ma
Stavros Petridis
M. Pantic
ViT
10
3
0
24 Mar 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
12
Next