Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16727
Cited By
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
29 March 2023
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking"
50 / 223 papers shown
Title
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
55
2
0
15 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
26
3
0
08 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
26
9
0
07 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
23
1
0
31 Oct 2024
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Penghui Ruan
Pichao Wang
Divya Saxena
Jiannong Cao
Yuhui Shi
DiffM
VGen
24
0
0
31 Oct 2024
Learning Video Representations without Natural Videos
Xueyang Yu
Xinlei Chen
Yossi Gandelsman
VGen
AI4TS
39
0
0
31 Oct 2024
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
Ruixun Liu
Kaiyu Li
Jiayi Song
Dongwei Sun
Xiangyong Cao
VGen
35
1
0
31 Oct 2024
ViMGuard: A Novel Multi-Modal System for Video Misinformation Guarding
Andrew Kan
Christopher Kan
Zaid Nabulsi
22
0
0
22 Oct 2024
ContextDet: Temporal Action Detection with Adaptive Context Aggregation
Ning Wang
Yun Xiao
Xiaopeng Peng
Xiaojun Chang
Xuanhong Wang
Dingyi Fang
14
2
0
20 Oct 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
63
7
0
16 Oct 2024
Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark for Fine-grained Motor Behavior Recognition
Cheng Liu
Xuyang Yan
Zekun Zhang
Cheng Ding
Tianhao Zhao
Shaya Jannati
Cynthia Martinez
Dietrich Stout
23
0
0
10 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
43
5
0
10 Oct 2024
The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024
Yinan Han
Qingyuan Jiang
Hongming Mei
Yang Yang
Jinhui Tang
15
0
0
08 Oct 2024
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Ge Ya Luo
Gian Mario Favero
Zhi Hao Luo
Alexia Jolicoeur-Martineau
Christopher Pal
VGen
16
4
0
07 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Mohit Bansal
Koustuv Sinha
AI4TS
49
3
0
04 Oct 2024
An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos
Arun V. Reddy
Ketul Shah
Corban Rivera
William Paul
Celso M. De Melo
Rama Chellappa
SLR
16
0
0
03 Oct 2024
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis
Nishq Poorav Desai
Ali Etemad
Michael A. Greenspan
23
0
0
30 Sep 2024
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
Min Yang
Zichen Zhang
Limin Wang
AI4TS
19
0
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
22
0
0
26 Sep 2024
Across-Game Engagement Modelling via Few-Shot Learning
Kosmas Pinitas
Konstantinos Makantasis
Georgios N. Yannakakis
14
1
0
19 Sep 2024
1M-Deepfakes Detection Challenge
Zhixi Cai
Abhinav Dhall
Shreya Ghosh
Munawar Hayat
D. Kollias
Kalin Stefanov
Usman Tariq
20
1
0
11 Sep 2024
Data Collection-free Masked Video Modeling
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
19
1
0
10 Sep 2024
SVS-GAN: Leveraging GANs for Semantic Video Synthesis
Khaled M. Seyam
Julian Wiederer
Markus Braun
Bin Yang
22
0
0
09 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
70
15
0
05 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
29
0
0
02 Sep 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
24
1
0
30 Aug 2024
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning
Zeyi Bo
Wuxi Sun
Ye Jin
VLM
27
0
0
29 Aug 2024
TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation
Anh-Dzung Doan
Vu Minh Hieu Phan
Surabhi Gupta
Markus Wagner
Tat-Jun Chin
Ian Reid
VGen
DiffM
30
0
0
26 Aug 2024
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Wentao Wu
Fanghua Hong
Xiao Wang
Chenglong Li
Jin Tang
VLM
41
1
0
23 Aug 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?
Chen Liang
Qiang Guo
Xiaochao Qu
Luoqi Liu
Ting Liu
VOS
24
0
0
20 Aug 2024
VrdONE: One-stage Video Visual Relation Detection
Xinjie Jiang
Chenxi Zheng
Xuemiao Xu
Bangzhen Liu
Weiying Zheng
Huaidong Zhang
Shengfeng He
VGen
VOS
34
3
0
18 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
27
1
0
13 Aug 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
59
6
0
13 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
27
1
0
12 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Rex Liu
Xin Liu
18
1
0
08 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
32
0
0
07 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
Xin Liu
Chao Hao
Zitong Yu
Huanjing Yue
Jingyu Yang
18
1
0
05 Aug 2024
YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition
Duc Manh Nguyen Dang
Viet-Hang Duong
Jia Ching Wang
Nhan Bui Duc
15
3
0
05 Aug 2024
Classification Matters: Improving Video Action Detection with Class-Specific Attention
Jinsung Lee
Taeoh Kim
Inwoong Lee
Minho Shim
Dongyoon Wee
Minsu Cho
Suha Kwak
34
0
0
29 Jul 2024
Motion Capture from Inertial and Vision Sensors
Xiaodong Chen
Wu Liu
Qian Bao
Xinchen Liu
Quanwei Yang
Ruoli Dai
Tao Mei
40
3
0
23 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
47
3
0
22 Jul 2024
CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
Ryoske Fujii
Ryo Hachiuma
Hideo Saito
31
1
0
20 Jul 2024
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
27
0
0
18 Jul 2024
Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism
Sangyoun Lee
Juho Jung
Changdae Oh
Sunghee Yun
42
0
0
18 Jul 2024
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang
Frederic Z. Zhang
Cristian Rodriguez
Yizhak Ben-Shabat
A. Cherian
Stephen Gould
25
2
0
16 Jul 2024
Weakly-supervised Autism Severity Assessment in Long Videos
Abid Ali
Mahmoud Ali
J. Odobez
Camilla Barbini
Séverine Dubuisson
Francois Bremond
Susanne Thümmler
17
0
0
12 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
31
4
0
11 Jul 2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
Omid Saremi
Madhu Advani
Vimal Thilak
Preetum Nakkiran
Chen Huang
Joshua Susskind
32
3
0
03 Jul 2024
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Le Yang
Ziwei Zheng
Yizeng Han
Hao-Ran Cheng
Shiji Song
Gao Huang
Fan Li
51
8
0
03 Jul 2024
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
Yurui Huang
Yang Yang
Shou Chen
Xiangyu Wu
Qingguo Chen
Jianfeng Lu
21
0
0
01 Jul 2024
Previous
1
2
3
4
5
Next