Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16727
Cited By
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
29 March 2023
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking"
50 / 223 papers shown
Title
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
33
3
0
21 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
41
3
0
20 Jun 2024
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
31
0
0
12 Jun 2024
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Zijian Chen
Wei Sun
Yuan Tian
Jun Jia
Zicheng Zhang
Jiarui Wang
Ru Huang
Xiongkuo Min
Guangtao Zhai
Wenjun Zhang
EGVM
45
8
0
10 Jun 2024
SMART: Scene-motion-aware human action recognition framework for mental disorder group
Zengyuan Lai
Jiarui Yang
Songpengcheng Xia
Qi Wu
Zhen Sun
Wenxian Yu
Ling Pei
35
2
0
07 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
37
1
0
05 Jun 2024
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
Ryoske Fujii
Masashi Hatano
Hideo Saito
Hiroki Kajita
19
5
0
30 May 2024
The SkatingVerse Workshop & Challenge: Methods and Results
Jian Zhao
Lei Jin
Jianshu Li
Zheng Zhu
Yinglei Teng
...
Shiníchi Satoh
Yandong Guo
Cewu Lu
Junliang Xing
Jane Shengmei Shen
AI4TS
18
0
0
27 May 2024
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
Qi Jia
Baoyu Fan
Cong Xu
Lu Liu
Liang Jin
Guoguang Du
Zhenhua Guo
Yaqian Zhao
Xuanjing Huang
Rengang Li
31
0
0
15 May 2024
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
25
1
0
09 May 2024
Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba
Hongwei Ren
Yue Zhou
Jiadong Zhu
Haotian Fu
Yulong Huang
Xiaopeng Lin
Yuetong Fang
Fei Ma
Hao Yu
Bo-Xun Cheng
Mamba
38
9
0
09 May 2024
pFedLVM: A Large Vision Model (LVM)-Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving
Wei-Bin Kou
Qingfeng Lin
Ming Tang
Sheng Xu
Rongguang Ye
...
Shuai Wang
Guofa Li
Zhenyu Chen
Guangxu Zhu
Yik-Chung Wu
FedML
43
10
0
07 May 2024
Hierarchical Space-Time Attention for Micro-Expression Recognition
Haihong Hao
Shuo Wang
Huixia Ben
Yanbin Hao
Yansong Wang
Weiwei Wang
11
1
0
06 May 2024
SFMViT: SlowFast Meet ViT in Chaotic World
Jiaying Lin
Jiajun Wen
Mengyuan Liu
Jinfu Liu
Baiqiao Yin
Yue Li
ViT
33
1
0
25 Apr 2024
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
Jiaxin Zhuang
Linshan Wu
Qiong Wang
V. Vardhanabhuti
Lin Luo
Hao Chen
Hao Chen
46
4
0
24 Apr 2024
On the Content Bias in Fréchet Video Distance
Jason S. Hoffman
Aniruddha Mahapatra
Gaurav Parmar
Jun-Yan Zhu
Jia-Bin Huang
EGVM
45
15
0
18 Apr 2024
Predicting Long-horizon Futures by Conditioning on Geometry and Time
Tarasha Khurana
Deva Ramanan
AI4TS
26
0
0
17 Apr 2024
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
22
0
0
15 Apr 2024
The 8th AI City Challenge
Shuo Wang
D. Anastasiu
Zhenghang Tang
Ming-Ching Chang
Yue Yao
...
Xunlei Wu
S. Pusegaonkar
Yizhou Wang
Sujit Biswas
Rama Chellappa
28
31
0
15 Apr 2024
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
Yuwei Tang
Zhenyi Lin
Qilong Wang
Pengfei Zhu
Qinghua Hu
26
11
0
13 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
34
3
0
06 Apr 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
Morteza Moradi
Mohammad Moradi
Francesco Rundo
C. Spampinato
Ali Borji
S. Palazzo
25
1
0
03 Apr 2024
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
26
30
0
01 Apr 2024
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization
Akshita Gupta
Gaurav Mittal
Ahmed Magooda
Ye Yu
Graham W. Taylor
Mei Chen
44
2
0
01 Apr 2024
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng
Xiaoyong Chen
Jiaming Liang
Huisi Wu
Guangzhong Cao
Yong Guo
AAML
32
3
0
29 Mar 2024
Every Shot Counts: Using Exemplars for Repetition Counting in Videos
Saptarshi Sinha
Alexandros Stergiou
Dima Damen
36
5
0
26 Mar 2024
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël
Renaud Vandeghen
A. Cioppa
Silvio Giancola
Bernard Ghanem
Marc Van Droogenbroeck
ViT
28
6
0
26 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
27
1
0
24 Mar 2024
Edit3K: Universal Representation Learning for Video Editing Components
Xin Gu
Libo Zhang
Fan Chen
Longyin Wen
Yufei Wang
Tiejian Luo
Sijie Zhu
30
4
0
24 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
24
104
0
22 Mar 2024
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul M. Chilimbi
VLM
AI4TS
43
4
0
21 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
40
3
0
20 Mar 2024
CoReEcho: Continuous Representation Learning for 2D+time Echocardiography Analysis
F. Maani
Numan Saeed
Aleksandr Matsun
Mohammad Yaqub
SyDa
50
3
0
15 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
68
0
14 Mar 2024
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
27
0
0
13 Mar 2024
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu
Mayuna Gupta
Chetan Madan
Pankaj Gupta
Chetan Arora
28
4
0
13 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
26
10
0
08 Mar 2024
Data-efficient Event Camera Pre-training via Disentangled Masked Modeling
Zhenpeng Huang
Chao Li
Hao Chen
Yongjian Deng
Yifeng Geng
Limin Wang
29
2
0
01 Mar 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei
Tao Chen
XiRuo Jiang
Huafeng Liu
Zeren Sun
Yazhou Yao
VGen
23
9
0
29 Feb 2024
Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving
Yichen Xie
Hongge Chen
Gregory P. Meyer
Yong Jae Lee
Eric M. Wolff
Masayoshi Tomizuka
Wei Zhan
Yuning Chai
Xin Huang
3DPC
22
0
0
23 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
27
29
0
20 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
79
70
0
15 Feb 2024
Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust
Bowen Shi
Skyler Wang
Necati Cihan Camgöz
Jean Maillard
SLR
37
14
0
14 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
19
1
0
14 Feb 2024
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
14
1
0
12 Feb 2024
Taylor Videos for Action Recognition
Lei Wang
Xiuyuan Yuan
Tom Gedeon
Liang Zheng
26
2
0
05 Feb 2024
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
Pum Jun Kim
Seojun Kim
Jaejun Yoo
EGVM
11
3
0
30 Jan 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model
Till Grutschus
Ola Karrar
Emir Esenov
Ekta Vats
18
0
0
29 Jan 2024
MV2MAE: Multi-View Video Masked Autoencoders
Ketul Shah
Robert Crandall
Jie Xu
Peng Zhou
Marian George
Mayank Bansal
Rama Chellappa
15
0
0
29 Jan 2024
Previous
1
2
3
4
5
Next