Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 712 papers shown
Title
Past Movements-Guided Motion Representation Learning for Human Motion Prediction
Junyu Shi
Baoxuan Wang
3DH
29
0
0
04 Aug 2024
Text-Guided Video Masked Autoencoder
D. Fan
Jue Wang
Shuai Liao
Zhikang Zhang
Vimal Bhat
Xinyu Li
VGen
16
3
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
Hyper-parameter tuning for text guided image editing
Shiwen Zhang
DiffM
33
0
0
31 Jul 2024
Mitral Regurgitation Recogniton based on Unsupervised Out-of-Distribution Detection with Residual Diffusion Amplification
Zhe Liu
Xiliang Zhu
Tong Han
Yuhao Huang
Jian Wang
M. Werman
Fang Wang
Dong Ni
Zhongshan Gou
Xin Yang
39
0
0
31 Jul 2024
PersonalityScanner: Exploring the Validity of Personality Assessment Based on Multimodal Signals in Virtual Reality
Xintong Zhang
Di Lu
Huiqi Hu
Nan Jiang
Xianhao Yu
Jinan Xu
Yujia Peng
Qing Li
Wenjuan Han
20
1
0
29 Jul 2024
Classification Matters: Improving Video Action Detection with Class-Specific Attention
Jinsung Lee
Taeoh Kim
Inwoong Lee
Minho Shim
Dongyoon Wee
Minsu Cho
Suha Kwak
44
0
0
29 Jul 2024
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Pulkit Kumar
Namitha Padmanabhan
Luke Luo
Sai Saketh Rambhatla
Abhinav Shrivastava
28
4
0
25 Jul 2024
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos
Zsófia Katona
Seyed Sahand Mohamadi Ziabari
F. Karimi Nejadasl
14
0
0
25 Jul 2024
OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
Andrew Zisserman
27
2
0
24 Jul 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming-hui Sun
Chao Zhou
Jihong Zhu
24
3
0
23 Jul 2024
Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models
Thinesh Thiyakesan Ponbagavathi
Kunyu Peng
Alina Roitberg
35
1
0
22 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
47
3
0
22 Jul 2024
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Fudong Lin
Jiadong Lou
Xu Yuan
Nianfeng Tzeng
ViT
AAML
18
1
0
22 Jul 2024
CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
Ryoske Fujii
Ryo Hachiuma
Hideo Saito
31
1
0
20 Jul 2024
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
56
3
0
20 Jul 2024
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
Yurong Zhang
Honghao Chen
Xinyu Zhang
Xiangxiang Chu
Li Song
38
1
0
19 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective
Zeen Song
Jingyao Wang
Jianqi Zhang
Changwen Zheng
Wenwen Qiang
SSL
54
0
0
19 Jul 2024
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma
Kai Li
Zhongshi Jiang
Moustafa Meshry
Qihao Liu
Huiyu Wang
Christian Hane
Alan L. Yuille
VGen
22
1
0
18 Jul 2024
Towards Understanding Unsafe Video Generation
Yan Pang
Aiping Xiong
Yang Zhang
Tianhao Wang
EGVM
27
2
0
17 Jul 2024
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju
Haicheng Wang
Haozhe Cheng
Xu Chen
Zhonghua Zhai
Weilin Huang
Jinsong Lan
Shuai Xiao
Bo Zheng
VLM
38
5
0
16 Jul 2024
AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder
Qiaoqiao Jin
Rui Shi
Yishun Dou
Bingbing Ni
CVBM
35
0
0
16 Jul 2024
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang
Zihao Xiao
Shikun Li
Fanzhao Lin
Jianmin Li
Shiming Ge
CVBM
32
9
0
15 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
34
7
0
11 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
32
3
0
10 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
44
1
0
09 Jul 2024
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang
Yifei Huang
Ruicong Liu
Yoichi Sato
37
4
0
09 Jul 2024
D-MASTER: Mask Annealed Transformer for Unsupervised Domain Adaptation in Breast Cancer Detection from Mammograms
Tajamul Ashraf
K. Rangarajan
Mohit Gambhir
Richa Gabha
Chetan Arora
MedIm
26
1
0
09 Jul 2024
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
32
7
0
07 Jul 2024
CBM: Curriculum by Masking
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
25
0
0
06 Jul 2024
ZARRIO @ Ego4D Short Term Object Interaction Anticipation Challenge: Leveraging Affordances and Attention-based models for STA
Lorenzo Mur-Labadia
Ruben Martinez-Cantin
J. Guerrero-Campo
G. Farinella
21
0
0
05 Jul 2024
QueryMamba: A Mamba-Based Encoder-Decoder Architecture with a Statistical Verb-Noun Interaction Module for Video Action Forecasting @ Ego4D Long-Term Action Anticipation Challenge 2024
Zeyun Zhong
Manuel Martin
Frederik Diederichs
Juergen Beyerer
35
2
0
04 Jul 2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
Omid Saremi
Madhu Advani
Vimal Thilak
Preetum Nakkiran
Chen Huang
Joshua Susskind
35
3
0
03 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
24
4
0
03 Jul 2024
Mask and Compress: Efficient Skeleton-based Action Recognition in Continual Learning
Matteo Mosconi
Andriy Sorokin
Aniello Panariello
Angelo Porrello
Jacopo Bonato
Marco Cotogni
Luigi Sabetta
Simone Calderara
Rita Cucchiara
CLL
32
1
0
01 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
29
52
0
30 Jun 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
M. Zhang
Tat-Seng Chua
Shuicheng Yan
AI4TS
34
37
0
27 Jun 2024
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
Zhuo Zheng
Stefano Ermon
Dongjun Kim
Liangpei Zhang
Yanfei Zhong
DiffM
38
19
0
26 Jun 2024
Video Occupancy Models
Manan Tomar
Philippe Hansen-Estruch
Philip Bachman
Alex Lamb
John Langford
Matthew E. Taylor
Sergey Levine
30
1
0
25 Jun 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
33
3
0
21 Jun 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
33
3
0
21 Jun 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
68
0
0
21 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
41
3
0
20 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
29
28
0
17 Jun 2024
GameVibe: A Multimodal Affective Game Corpus
M. Barthet
Maria Kaselimi
Kosmas Pinitas
Konstantinos Makantasis
Antonios Liapis
Georgios N. Yannakakis
20
3
0
17 Jun 2024
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
Samar Khanna
Medhanie Irgau
David B. Lobell
Stefano Ermon
VLM
28
4
0
16 Jun 2024
Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation
Pengfei Gu
Yejia Zhang
Huimin Li
Chaoli Wang
D. Z. Chen
MedIm
37
1
0
15 Jun 2024
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
Weichao Zhao
Wengang Zhou
Hezhen Hu
Min Wang
Houqiang Li
SLR
30
2
0
15 Jun 2024
AVR: Synergizing Foundation Models for Audio-Visual Humor Detection
Sarthak Sharma
Orchid Chetia Phukan
Drishti Singh
Arun Balaji Buduru
Rajesh Sharma
23
0
0
15 Jun 2024
LieRE: Generalizing Rotary Position Encodings
Sophie Ostmeier
Brian Axelrod
Michael E. Moseley
Akshay S. Chaudhari
C. Langlotz
18
1
0
14 Jun 2024
Previous
1
2
3
4
5
6
...
13
14
15
Next