ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.01529
  4. Cited By
BEVT: BERT Pretraining of Video Transformers

BEVT: BERT Pretraining of Video Transformers

2 December 2021
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
    ViT
ArXivPDFHTML

Papers citing "BEVT: BERT Pretraining of Video Transformers"

34 / 34 papers shown
Title
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Guodong Shen
Yuqi Ouyang
Junru Lu
Yixuan Yang
Victor Sanchez
23
1
0
20 Apr 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
119
0
0
26 Mar 2025
VidTwin: Video VAE with Decoupled Structure and Dynamics
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Jiang Bian
DRL
VGen
73
3
0
23 Dec 2024
Social-MAE: Social Masked Autoencoder for Multi-person Motion
  Representation Learning
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning
Mahsa Ehsanpour
Ian Reid
Hamid Rezatofighi
ViT
27
0
0
08 Apr 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie M. Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
52
0
0
15 Jan 2024
M-BEV: Masked BEV Perception for Robust Autonomous Driving
M-BEV: Masked BEV Perception for Robust Autonomous Driving
Siran Chen
Yue Ma
Yu Qiao
Yali Wang
19
8
0
19 Dec 2023
Entangled View-Epipolar Information Aggregation for Generalizable Neural
  Radiance Fields
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min
Yawei Luo
Wei Yang
Yuesong Wang
Yi Yang
19
2
0
20 Nov 2023
Interpretability-Aware Vision Transformer
Interpretability-Aware Vision Transformer
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
74
7
0
14 Sep 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised
  RGB2Depth Adaptation
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
11
3
0
23 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
22
29
0
21 Aug 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation
  from Videos?
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
37
48
0
31 Jul 2023
Efficient Multimodal Fusion via Interactive Prompting
Efficient Multimodal Fusion via Interactive Prompting
Yaowei Li
Ruijie Quan
Linchao Zhu
Yezhou Yang
20
42
0
13 Apr 2023
Layer Grafted Pre-training: Bridging Contrastive Learning And Masked
  Image Modeling For Label-Efficient Representations
Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
Ziyu Jiang
Yinpeng Chen
Mengchen Liu
Dongdong Chen
Xiyang Dai
Lu Yuan
Zicheng Liu
Zhangyang Wang
SSL
VLM
CLIP
16
16
0
27 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
ViT
11
143
0
06 Feb 2023
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Cheng Lu
Xiaojie Jin
Zhicheng Huang
Qibin Hou
Mingg-Ming Cheng
Jiashi Feng
22
8
0
15 Jan 2023
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language
  Pre-training
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
26
15
0
21 Nov 2022
Self-supervised Video Representation Learning with Motion-Aware Masked
  Autoencoders
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
H. Yao
Yi-Xin Jiang
Xiatian Zhu
Zehuan Yuan
18
19
0
09 Oct 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
36
70
0
30 Jul 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
25
32
0
19 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
13
130
0
18 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
17
95
0
16 Jun 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
35
45
0
03 May 2022
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal
  Action Localization
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Bo He
Xitong Yang
Le Kang
Zhiyu Cheng
Xingfa Zhou
Abhinav Shrivastava
14
76
0
29 Mar 2022
ObjectFormer for Image Manipulation Detection and Localization
ObjectFormer for Image Manipulation Detection and Localization
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu-Gang Jiang
20
105
0
28 Mar 2022
Protecting Celebrities from DeepFake with Identity Consistency
  Transformer
Protecting Celebrities from DeepFake with Identity Consistency Transformer
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Ting Zhang
Weiming Zhang
Nenghai Yu
Dong Chen
Fang Wen
B. Guo
ViT
14
117
0
02 Mar 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
11
237
0
24 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Mobile-Former: Bridging MobileNet and Transformer
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
169
462
0
12 Aug 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
193
375
0
01 Feb 2021
ECO: Efficient Convolutional Network for Online Video Understanding
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
119
495
0
24 Apr 2018
1