Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16727
Cited By
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
29 March 2023
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking"
50 / 223 papers shown
Title
Video Dataset Condensation with Diffusion Models
Zhe Li
Hadrien Reynaud
Mischa Dombrowski
Sarah Cechnicka
Franciskus Xaverius Erick
Bernhard Kainz
DD
VGen
37
0
0
10 May 2025
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
Ho-Joong Kim
Y. E. Lee
Jung-Ho Hong
Seong-Whan Lee
28
0
0
09 May 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
74
0
0
30 Apr 2025
PSG-MAE: Robust Multitask Sleep Event Monitoring using Multichannel PSG Reconstruction and Inter-channel Contrastive Learning
Yifei Wang
Qi Liu
Fuli Min
Honghao Wang
12
0
0
17 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
X. Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
27
0
0
17 Apr 2025
Evolutionary algorithms meet self-supervised learning: a comprehensive survey
Adriano Vinhas
João Correia
Penousal Machado
SSL
SyDa
59
0
0
09 Apr 2025
Post-processing for Fair Regression via Explainable SVD
Zhiqun Zuo
Ding Zhu
Mohammad Mahdi Khalili
58
0
0
04 Apr 2025
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao
Pranav Virupaksha
Wenqi Jia
Bolin Lai
Fiona Ryan
Sangmin Lee
James M. Rehg
SLR
49
0
0
03 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
31
0
0
03 Apr 2025
FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection
Xinnan Zhu
Yicheng Zhu
Tixin Chen
Wentao Wu
Yuanjie Dang
46
0
0
01 Apr 2025
Understanding Co-speech Gestures in-the-wild
Sindhu B. Hegde
KR Prajwal
Taein Kwon
Andrew Zisserman
SLR
52
0
0
28 Mar 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
121
0
0
26 Mar 2025
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Stefan Stojanov
David Wendt
Seungwoo Kim
R. Venkatesh
Kevin T. Feigelis
Jiajun Wu
Daniel L. K. Yamins
SSL
66
0
0
25 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
42
0
0
24 Mar 2025
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao
Xuxin Cheng
Zhiqi Huang
Lei Li
55
0
0
22 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
48
0
0
17 Mar 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
49
0
0
17 Mar 2025
Multi Activity Sequence Alignment via Implicit Clustering
Taein Kwon
Zador Pataki
Mahdi Rad
Marc Pollefeys
HAI
AI4TS
58
0
0
16 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
42
0
0
16 Mar 2025
NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models
Mert Albaba
Chenhao Li
Markos Diomataris
Omid Taheri
Andreas Krause
M. Black
VGen
53
0
0
13 Mar 2025
Semantic Latent Motion for Portrait Video Generation
Qiyuan Zhang
Chenyu Wu
Wenzhang Sun
Huaize Liu
Donglin Di
Wei Chen
Changqing Zou
VGen
64
0
0
13 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Z. Wang
Bernard Ghanem
44
1
0
09 Mar 2025
End-to-End Action Segmentation Transformer
Tieqiao Wang
Sinisa Todorovic
ViT
37
0
0
08 Mar 2025
Spatial-Temporal Perception with Causal Inference for Naturalistic Driving Action Recognition
Qing Chang
Wei Dai
Zhihao Shuai
Limin Yu
Yutao Yue
CML
52
0
0
06 Mar 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
55
3
0
27 Feb 2025
EndoMamba: An Efficient Foundation Model for Endoscopic Videos
Qingyao Tian
Huai Liao
Xinyan Huang
Bingyu Yang
Dongdong Lei
Sebastien Ourselin
Hongbin Liu
Mamba
65
0
0
26 Feb 2025
L4P: Low-Level 4D Vision Perception Unified
Abhishek Badki
Hang Su
Bowen Wen
Orazio Gallo
VLM
75
1
0
18 Feb 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni
Yuxin Guo
Yichen Liu
Rui Chen
Lewei Lu
Z. Wu
DiffM
VGen
59
3
0
17 Feb 2025
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Linyi Jiang
Silvery Fu
Yifei Zhu
Bo Li
ViT
63
0
0
14 Feb 2025
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
R. Yasarla
Manish Kumar Singh
Hong Cai
Yunxiao Shi
Jisoo Jeong
Yinhao Zhu
Shizhong Han
Risheek Garrepalli
Fatih Porikli
MDE
80
5
0
17 Jan 2025
MVP: Multimodal Emotion Recognition based on Video and Physiological Signals
Valeriya Strizhkova
Hadi Kachmar
Hava Chaptoukaev
Raphael Kalandadze
Natia Kukhilava
...
Maria A. Zuluaga
Michal Balazia
A. Dantcheva
François Brémond
Laura M. Ferrari
30
0
0
06 Jan 2025
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Xiaoyang Liu
Boran Wen
Xinpeng Liu
Zizheng Zhou
Hongwei Fan
Cewu Lu
Lizhuang Ma
Yulong Chen
Y. Li
46
2
0
27 Dec 2024
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
77
3
0
19 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
67
0
0
18 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
158
0
0
18 Dec 2024
DINO-Foresight
\texttt{DINO-Foresight}
DINO-Foresight
: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
79
1
0
16 Dec 2024
Video Representation Learning with Joint-Embedding Predictive Architectures
Katrina Drozdov
Ravid Shwartz-Ziv
Yann LeCun
AI4TS
67
1
0
14 Dec 2024
EdgeOAR: Real-time Online Action Recognition On Edge Devices
Wei Luo
Deyu Zhang
Ying Tang
Fan Wu
Yaoxue Zhang
65
0
0
02 Dec 2024
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
Joseph Heyward
João Carreira
Dima Damen
Andrew Zisserman
Viorica Patraucean
72
2
0
29 Nov 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
72
1
0
28 Nov 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Z. Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
87
11
0
27 Nov 2024
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Zhichao Zhang
Wei Sun
Xinyue Li
Yunhao Li
Qihang Ge
...
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
EGVM
117
1
0
25 Nov 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
64
0
0
24 Nov 2024
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions
Guanyu Zhou
Wenxuan Liu
Wenxin Huang
Xuemei Jia
X. Zhong
Chia-Wen Lin
CML
69
0
0
24 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
68
0
0
24 Nov 2024
When Spatial meets Temporal in Action Recognition
H. Chen
Lei Wang
Y. Chen
Tom Gedeon
Piotr Koniusz
81
2
0
22 Nov 2024
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
72
1
0
20 Nov 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
Maheswar Bora
Saurabh Atreya
Aritra Mukherjee
Abhijit Das
71
0
0
19 Nov 2024
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin
Cheng-En Wu
Huanran Li
Jifan Zhang
Yu Hen Hu
Pedro Morgado
23
0
0
16 Nov 2024
1
2
3
4
5
Next