ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16727
  4. Cited By
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

29 March 2023
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
    VGen
ArXivPDFHTML

Papers citing "VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking"

50 / 223 papers shown
Title
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
55
2
0
15 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
26
3
0
08 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
26
9
0
07 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With
  SSMs
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
23
1
0
31 Oct 2024
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding
  and Conditioning
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Penghui Ruan
Pichao Wang
Divya Saxena
Jiannong Cao
Yuhui Shi
DiffM
VGen
24
0
0
31 Oct 2024
Learning Video Representations without Natural Videos
Learning Video Representations without Natural Videos
Xueyang Yu
Xinlei Chen
Yossi Gandelsman
VGen
AI4TS
39
0
0
31 Oct 2024
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
Ruixun Liu
Kaiyu Li
Jiayi Song
Dongwei Sun
Xiangyong Cao
VGen
35
1
0
31 Oct 2024
ViMGuard: A Novel Multi-Modal System for Video Misinformation Guarding
ViMGuard: A Novel Multi-Modal System for Video Misinformation Guarding
Andrew Kan
Christopher Kan
Zaid Nabulsi
22
0
0
22 Oct 2024
ContextDet: Temporal Action Detection with Adaptive Context Aggregation
ContextDet: Temporal Action Detection with Adaptive Context Aggregation
Ning Wang
Yun Xiao
Xiaopeng Peng
Xiaojun Chang
Xuanhong Wang
Dingyi Fang
14
2
0
20 Oct 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic
  Synchronization
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
63
7
0
16 Oct 2024
Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark
  for Fine-grained Motor Behavior Recognition
Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark for Fine-grained Motor Behavior Recognition
Cheng Liu
Xuyang Yan
Zekun Zhang
Cheng Ding
Tianhao Zhao
Shaya Jannati
Cynthia Martinez
Dietrich Stout
23
0
0
10 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
43
5
0
10 Oct 2024
The Solution for Temporal Action Localisation Task of Perception Test
  Challenge 2024
The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024
Yinan Han
Qingyuan Jiang
Hongming Mei
Yang Yang
Jinhui Tang
15
0
0
08 Oct 2024
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Ge Ya Luo
Gian Mario Favero
Zhi Hao Luo
Alexia Jolicoeur-Martineau
Christopher Pal
VGen
16
4
0
07 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Mohit Bansal
Koustuv Sinha
AI4TS
49
3
0
04 Oct 2024
An Evaluation of Large Pre-Trained Models for Gesture Recognition using
  Synthetic Videos
An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos
Arun V. Reddy
Ketul Shah
Corban Rivera
William Paul
Celso M. De Melo
Rama Chellappa
SLR
16
0
0
03 Oct 2024
CycleCrash: A Dataset of Bicycle Collision Videos for Collision
  Prediction and Analysis
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis
Nishq Poorav Desai
Ali Etemad
Michael A. Greenspan
23
0
0
30 Sep 2024
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
Min Yang
Zichen Zhang
Limin Wang
AI4TS
19
0
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient
  Object-Aware Pretraining
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
22
0
0
26 Sep 2024
Across-Game Engagement Modelling via Few-Shot Learning
Across-Game Engagement Modelling via Few-Shot Learning
Kosmas Pinitas
Konstantinos Makantasis
Georgios N. Yannakakis
14
1
0
19 Sep 2024
1M-Deepfakes Detection Challenge
1M-Deepfakes Detection Challenge
Zhixi Cai
Abhinav Dhall
Shreya Ghosh
Munawar Hayat
D. Kollias
Kalin Stefanov
Usman Tariq
20
1
0
11 Sep 2024
Data Collection-free Masked Video Modeling
Data Collection-free Masked Video Modeling
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
19
1
0
10 Sep 2024
SVS-GAN: Leveraging GANs for Semantic Video Synthesis
SVS-GAN: Leveraging GANs for Semantic Video Synthesis
Khaled M. Seyam
Julian Wiederer
Markus Braun
Bin Yang
22
0
0
09 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
70
15
0
05 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
29
0
0
02 Sep 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
24
1
0
30 Aug 2024
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task
  prompt learning
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning
Zeyi Bo
Wuxi Sun
Ye Jin
VLM
27
0
0
29 Aug 2024
TC-PDM: Temporally Consistent Patch Diffusion Models for
  Infrared-to-Visible Video Translation
TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation
Anh-Dzung Doan
Vu Minh Hieu Phan
Surabhi Gupta
Markus Wagner
Tat-Jun Chin
Ian Reid
VGen
DiffM
30
0
0
26 Aug 2024
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation
  Models
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Wentao Wu
Fanghua Hong
Xiao Wang
Chenglong Li
Jin Tang
VLM
41
1
0
23 Aug 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the
  Model Learn as Intended?
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?
Chen Liang
Qiang Guo
Xiaochao Qu
Luoqi Liu
Ting Liu
VOS
24
0
0
20 Aug 2024
VrdONE: One-stage Video Visual Relation Detection
VrdONE: One-stage Video Visual Relation Detection
Xinjie Jiang
Chenxi Zheng
Xuemiao Xu
Bangzhen Liu
Weiying Zheng
Huaidong Zhang
Shengfeng He
VGen
VOS
34
3
0
18 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
27
1
0
13 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
59
6
0
13 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in
  Underperformed Scenes
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
27
1
0
12 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Rex Liu
Xin Liu
18
1
0
08 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context
  Relation Modeling
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
32
0
0
07 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action
  Anticipation
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
Xin Liu
Chao Hao
Zitong Yu
Huanjing Yue
Jingyu Yang
18
1
0
05 Aug 2024
YOWOv3: An Efficient and Generalized Framework for Human Action
  Detection and Recognition
YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition
Duc Manh Nguyen Dang
Viet-Hang Duong
Jia Ching Wang
Nhan Bui Duc
15
3
0
05 Aug 2024
Classification Matters: Improving Video Action Detection with
  Class-Specific Attention
Classification Matters: Improving Video Action Detection with Class-Specific Attention
Jinsung Lee
Taeoh Kim
Inwoong Lee
Minho Shim
Dongyoon Wee
Minsu Cho
Suha Kwak
34
0
0
29 Jul 2024
Motion Capture from Inertial and Vision Sensors
Motion Capture from Inertial and Vision Sensors
Xiaodong Chen
Wu Liu
Qian Bao
Xinchen Liu
Quanwei Yang
Ruoli Dai
Tao Mei
40
3
0
23 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
47
3
0
22 Jul 2024
CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density
  Forecasting
CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
Ryoske Fujii
Ryo Hachiuma
Hideo Saito
31
1
0
20 Jul 2024
Pose-guided multi-task video transformer for driver action recognition
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
27
0
0
18 Jul 2024
Enhancing Temporal Action Localization: Advanced S6 Modeling with
  Recurrent Mechanism
Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism
Sangyoun Lee
Juho Jung
Changdae Oh
Sunghee Yun
42
0
0
18 Jul 2024
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang
Frederic Z. Zhang
Cristian Rodriguez
Yizhak Ben-Shabat
A. Cherian
Stephen Gould
25
2
0
16 Jul 2024
Weakly-supervised Autism Severity Assessment in Long Videos
Weakly-supervised Autism Severity Assessment in Long Videos
Abid Ali
Mahmoud Ali
J. Odobez
Camilla Barbini
Séverine Dubuisson
Francois Bremond
Susanne Thümmler
17
0
0
12 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
31
4
0
11 Jul 2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self
  Distillation Networks
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
Omid Saremi
Madhu Advani
Vimal Thilak
Preetum Nakkiran
Chen Huang
Joshua Susskind
32
3
0
03 Jul 2024
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Le Yang
Ziwei Zheng
Yizeng Han
Hao-Ran Cheng
Shiji Song
Gao Huang
Fan Li
51
8
0
03 Jul 2024
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception
  Test Challenge 2023
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
Yurui Huang
Yang Yang
Shou Chen
Xiangyu Wu
Qingguo Chen
Jianfeng Lu
21
0
0
01 Jul 2024
Previous
12345
Next