Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15691
Cited By
v1
v2 (latest)
ViViT: A Video Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3544★)
Papers citing
"ViViT: A Video Vision Transformer"
50 / 1,308 papers shown
Title
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception
APSIPA Transactions on Signal and Information Processing (TASIP), 2024
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
H. Wang
293
4
0
14 Nov 2024
Pay Attention to the Keys: Visual Piano Transcription Using Transformers
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Uros Zivanovic
Ivan Pilkov
Carlos Eduardo Cancino-Chacón
ViT
156
0
0
13 Nov 2024
Balancing Multimodal Training Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
364
3
0
11 Nov 2024
CityGuessr: City-Level Video Geo-Localization on a Global Scale
European Conference on Computer Vision (ECCV), 2024
P. Kulkarni
Gaurav Kumar Nayak
Mubarak Shah
ViT
AI4TS
169
9
0
10 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Neural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
225
25
0
07 Nov 2024
DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction
Li Zhao
Zhengmin Lu
VGen
177
0
0
07 Nov 2024
Can Language Models Enable In-Context Database?
Yu Pan
Hongfeng Yu
Tianjiao Zhao
Jianxin Sun
KELM
SyDa
LMTD
126
0
0
04 Nov 2024
Visual Fourier Prompt Tuning
Neural Information Processing Systems (NeurIPS), 2024
Runjia Zeng
Cheng Han
Qifan Wang
Chunshu Wu
Tong Geng
Lifu Huang
Ying Nian Wu
Dongfang Liu
VPVLM
VLM
396
26
0
02 Nov 2024
STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models
Zerui Wang
Yan Liu
280
6
0
01 Nov 2024
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
Ruixun Liu
Kaiyu Li
Jiayi Song
Dongwei Sun
Xiangyong Cao
VGen
176
2
0
31 Oct 2024
DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake Detection
IEEE transactions on multimedia (IEEE TMM), 2024
Fan Nie
Jiangqun Ni
Jian Zhang
Bin Zhang
Weizhe Zhang
DiffM
231
6
0
31 Oct 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Neural Information Processing Systems (NeurIPS), 2024
Xiufeng Song
Xiao Guo
Junxuan Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
VGen
DiffM
608
28
0
31 Oct 2024
A Theoretical Perspective for Speculative Decoding Algorithm
Neural Information Processing Systems (NeurIPS), 2024
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
192
20
0
30 Oct 2024
EEG-based Multimodal Representation Learning for Emotion Recognition
Balkan Conference in Informatics (BI), 2024
Kang Yin
Hye-Bin Shin
Dan Li
Seong-Whan Lee
177
8
0
29 Oct 2024
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
Computer Vision and Image Understanding (CVIU), 2024
Manuel Benavent-Lledo
David Mulero-Pérez
David Ortiz-Perez
José García Rodríguez
Antonis Argyros
288
3
0
28 Oct 2024
Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving
Jiangming Wang
Xiao Yang
Zhenyu Wang
Ximeng Wei
Ange Wang
Dengbo He
Kaishun Wu
261
8
0
28 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Applied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
211
3
0
24 Oct 2024
Are Visual-Language Models Effective in Action Recognition? A Comparative Study
Mahmoud Ali
Di Yang
François Brémond
VLM
239
3
0
22 Oct 2024
Masked Differential Privacy
David Schneider
Sina Sajadmanesh
Vikash Sehwag
Saquib Sarfraz
Rainer Stiefelhagen
Lingjuan Lyu
Vivek Sharma
210
0
0
22 Oct 2024
Multimodal Learning for Embryo Viability Prediction in Clinical IVF
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Junsik Kim
Zhiyi Shi
Davin Jeong
Johannes Knittel
H. Yang
...
Wanhua Li
Yicong Li
D. Ben-Yosef
D. Needleman
Hanspeter Pfister
215
3
0
21 Oct 2024
SEA: State-Exchange Attention for High-Fidelity Physics Based Transformers
Neural Information Processing Systems (NeurIPS), 2024
Parsa Esmati
Amirhossein Dadashzadeh
Vahid Goodarzi
Nicolas Larrosa
Nicolo Grilli
281
0
0
20 Oct 2024
ContextDet: Temporal Action Detection with Adaptive Context Aggregation
Ning Wang
Yun Xiao
Xiaopeng Peng
Xiaojun Chang
Xuanhong Wang
Dingyi Fang
323
4
0
20 Oct 2024
Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2024
Kristina Telegraph
Christos Kyrkou
ObjD
225
1
0
17 Oct 2024
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
210
7
0
17 Oct 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
223
14
0
16 Oct 2024
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
Neural Information Processing Systems (NeurIPS), 2024
Zhi Wang
Li Zhang
Wenhao Wu
Yuanheng Zhu
Dongbin Zhao
C. L. Philip Chen
OffRL
216
15
0
15 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Neural Information Processing Systems (NeurIPS), 2024
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
226
3
0
14 Oct 2024
ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification
Chen Mao
Chong Tan
Jingqi Hu
Min Zheng
164
2
0
13 Oct 2024
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond
Asian Conference on Computer Vision (ACCV), 2024
Md Tanvir Islam
Inzamamul Alam
Simon Woo
Saeed Anwar
IK Hyun Lee
Khan Muhammad
ViT
211
11
0
13 Oct 2024
Movie Trailer Genre Classification Using Multimodal Pretrained Features
Expert systems with applications (ESWA), 2024
Serkan Sulun
Paula Viana
M. Davies
CLIP
179
8
0
11 Oct 2024
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
International Conference on Learning Representations (ICLR), 2024
Onkar Susladkar
Jishu Sen Gupta
Chirag Sehgal
Sparsh Mittal
Rekha Singhal
DiffM
VGen
302
1
0
10 Oct 2024
QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model
Neural Information Processing Systems (NeurIPS), 2024
Fei Xie
Weijia Zhang
Zhongdao Wang
Chao Ma
Mamba
260
18
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
International Conference on Learning Representations (ICLR), 2024
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
618
276
0
09 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
Michael R. Lyu
Liwei Wang
VLM
172
6
0
08 Oct 2024
Linear Transformer Topological Masking with Graph Random Features
International Conference on Learning Representations (ICLR), 2024
Isaac Reid
Kumar Avinava Dubey
Deepali Jain
Will Whitney
Amr Ahmed
...
Connor Schenck
Richard E. Turner
René Wagner
Adrian Weller
Krzysztof Choromanski
224
4
0
04 Oct 2024
Action Selection Learning for Multi-label Multi-view Action Recognition
ACM Multimedia Asia (MMAsia), 2024
Trung Thanh Nguyen
Yasutomo Kawanishi
Takahiro Komamizu
Ichiro Ide
236
7
0
04 Oct 2024
ECHOPulse: ECG controlled echocardio-grams video generation
International Conference on Learning Representations (ICLR), 2024
Yiwei Li
Sekeun Kim
Zihao Wu
Hanqi Jiang
Yi Pan
...
Sifan Song
Yucheng Shi
Tianming Liu
Quanzheng Li
Xiang Li
VGen
167
4
0
04 Oct 2024
TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids
Novel Intelligent and Leading Emerging Sciences Conference (NILES), 2024
Mazen Balat
Mahmoud Essam Gabr
Hend Bakr
A. Zaky
66
6
0
01 Oct 2024
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Nishq Poorav Desai
Ali Etemad
Michael A. Greenspan
293
4
0
30 Sep 2024
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis
International Conference on Pattern Recognition (ICPR), 2024
Shukesh Reddy
Nishit Poddar
Srijan Das
Abhijit Das
CVBM
271
1
0
29 Sep 2024
Spiking Transformer with Spatial-Temporal Attention
Computer Vision and Pattern Recognition (CVPR), 2024
Donghyun Lee
Yuhang Li
Youngeun Kim
Shiting Xiao
Priyadarshini Panda
361
14
0
29 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
240
0
0
26 Sep 2024
MASSFormer: Mobility-Aware Spectrum Sensing using Transformer-Driven Tiered Structure
IEEE Communications Letters (IEEE Commun. Lett.), 2024
Dimpal Janu
Sandeep Mandia
Kuldeep Singh
Sandeep Kumar
183
0
0
26 Sep 2024
Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data
Lukas Heine
Fabian Horst
Jana Fragemann
Gijs Luijten
M. Balzer
Jan Egger
F. Bahnsen
M. Sarfraz
Jens Kleesiek
242
0
0
25 Sep 2024
Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics
Paul Garnier
J. Viquerat
E. Hachem
AI4CE
133
5
0
18 Sep 2024
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving
IEEE International Conference on Robotics and Automation (ICRA), 2024
Songning Lai
Tianlang Xue
Songning Lai
Lijie Hu
Jiemin Wu
Ninghui Feng
Runwei Guan
Haicheng Liao
Zhenning Li
Yutao Yue
177
8
0
16 Sep 2024
A Survey of Foundation Models for Music Understanding
Wenjun Li
Ying Cai
Ziyang Wu
Wenyi Zhang
Yifan Chen
...
Junwei Han
Bao Ge
Tianming Liu
Lin Gan
Tuo Zhang
230
3
0
15 Sep 2024
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects
IEEE Access (IEEE Access), 2024
Awal Ahmed Fime
Saifuddin Mahmud
Arpita Das
Md. Sunzidul Islam
Hong-Hoon Kim
VGen
3DV
235
2
0
14 Sep 2024
TabMixer: Noninvasive Estimation of the Mean Pulmonary Artery Pressure via Imaging and Tabular Data Mixing
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Michal K. Grzeszczyk
Przemysław Korzeniowski
S. Alabed
Andrew J Swift
Tomasz Trzciñski
Arkadiusz Sitek
114
2
0
11 Sep 2024
Data Collection-free Masked Video Modeling
European Conference on Computer Vision (ECCV), 2024
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
166
1
0
10 Sep 2024
Previous
1
2
3
...
5
6
7
...
25
26
27
Next