Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15691
Cited By
v1
v2 (latest)
ViViT: A Video Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3544★)
Papers citing
"ViViT: A Video Vision Transformer"
50 / 1,306 papers shown
Title
Revisiting Kernel Attention with Correlated Gaussian Process Representation
Conference on Uncertainty in Artificial Intelligence (UAI), 2025
Long Minh Bui
Tho Tran Huu
Duy-Tung Dinh
T. Nguyen
Trong Nghia Hoang
328
5
0
27 Feb 2025
Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking
Workshop on Hyperspectral Image and Signal Processing (WHISPERS), 2024
Shaheer Mohamed
Tharindu Fernando
Sridha Sridharan
Peyman Moghadam
Clinton Fookes
ViT
399
1
0
26 Feb 2025
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention
Pattern Recognition (Pattern Recogn.), 2024
Bochao Zou
Zizheng Guo
Jiansheng Chen
Junbao Zhuo
Weiran Huang
Huimin Ma
ViT
AI4TS
329
15
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
572
22
0
21 Feb 2025
MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching
Yen-Siang Wu
Chi-Pin Huang
Fu-En Yang
Yu-Jie Wang
DiffM
VGen
243
2
0
18 Feb 2025
Improving action segmentation via explicit similarity measurement
Kamel Aouaidjia
Wenhao Zhang
Aofan Li
Chongsheng Zhang
246
0
0
15 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
297
0
0
11 Feb 2025
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
383
9
0
11 Feb 2025
Conformal Predictions for Human Action Recognition with Vision-Language Models
Bary Tim
Fuchs Clément
Macq Benoît
VLM
321
0
0
10 Feb 2025
HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment
Lifan Jiang
Boxi Wu
Jiahui Zhang
Xiaotong Guan
Shuang Chen
VGen
232
7
0
02 Feb 2025
Cross-Modal Synergies: Unveiling the Potential of Motion-Aware Fusion Networks in Handling Dynamic and Static ReID Scenarios
Fuxi Ling
Hongye Liu
Guoqiang Huang
Jing Li
Hong Wu
Zhihao Tang
395
0
0
02 Feb 2025
SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models
European Conference on Artificial Intelligence (ECAI), 2025
Jiawen Zhang
Kejia Chen
Zunlei Feng
Jian Lou
Weilong Dai
Qingbin Liu
Xiaoyu Yang
AAML
SILM
FedML
458
1
0
02 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
305
5
0
28 Jan 2025
Leveraging Video Vision Transformer for Alzheimer's Disease Diagnosis from 3D Brain MRI
Taymaz Akan
Sait Alp
Md. Shenuarin Bhuiyan
Elizabeth A. Disbrow
Steven A. Conrad
John A. Vanchiere
Christopher G. Kevil
M. A. N. Bhuiyan
MedIm
147
3
0
28 Jan 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Conference on Multimedia Modeling (MMM), 2025
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
409
4
0
22 Jan 2025
Slot-BERT: Self-supervised Object Discovery in Surgical Video
Guiqiu Liao
M. Jogan
Marcel Hussing
Kenta Nakahashi
Kazuhiro Yasufuku
Amin Madani
Eric Eaton
Daniel A. Hashimoto
1.0K
2
0
21 Jan 2025
Counteracting temporal attacks in Video Copy Detection
Katarzyna Fojcik
Piotr Syga
AAML
217
0
0
19 Jan 2025
DynST: Dynamic Sparse Training for Resource-Constrained Spatio-Temporal Forecasting
Knowledge Discovery and Data Mining (KDD), 2024
Hao Wu
Haomin Wen
Guibin Zhang
Yutong Xia
Kai Wang
Yuxuan Liang
Yu Zheng
Kun Wang
378
5
0
17 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
IEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
666
66
0
17 Jan 2025
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
Computer Vision and Pattern Recognition (CVPR), 2025
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
303
5
0
14 Jan 2025
Scaling Up ESM2 Architectures for Long Protein Sequences Analysis: Long and Quantized Approaches
Brazilian Symposium on Bioinformatics (SBB), 2024
Gabriel Bianchin de Oliveira
Hélio Pedrini
Z. Dias
MQ
147
0
0
13 Jan 2025
Soft Vision-Based Tactile-Enabled SixthFinger: Advancing Daily Objects Manipulation for Stroke Survivors
International Conference on Soft Robotics (RoboSoft), 2025
Basma B. Hasanen
Mashood M. Mohsan
Abdulaziz Alkayas
F. Renda
Irfan Hussain
212
0
0
12 Jan 2025
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Tanay Agrawal
Mohammed Guermal
Michal Balazia
François Brémond
193
0
0
08 Jan 2025
Measuring Error Alignment for Decision-Making Systems
AAAI Conference on Artificial Intelligence (AAAI), 2024
Binxia Xu
Antonis Bikakis
Daniel Onah
A. Vlachidis
Luke Dickens
372
1
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Computer Vision and Pattern Recognition (CVPR), 2023
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Celine Lee
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
248
38
0
31 Dec 2024
Predicting Chess Puzzle Difficulty with Transformers
BigData Congress [Services Society] (BSS), 2024
Szymon Miłosz
Paweł Kapusta
147
5
0
31 Dec 2024
DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images
Enbo Huang
Yuan Zhang
Faliang Huang
Guangyu Zhang
Wenshu Fan
DiffM
178
0
0
25 Dec 2024
Hierarchical Vector Quantization for Unsupervised Action Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2024
Federico Spurio
Emad Bahrami
Gianpiero Francesca
Juergen Gall
318
8
0
23 Dec 2024
Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
Mingliang Xu
Yuyao Zhou
Yuxin Zhang
Shen Li
Shen Li
Jiayi Ji
Zhanpeng Zeng
Rongrong Ji
MQ
743
0
0
21 Dec 2024
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
387
18
0
19 Dec 2024
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
Computer Vision and Pattern Recognition (CVPR), 2024
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
262
5
0
18 Dec 2024
Future Aspects in Human Action Recognition: Exploring Emerging Techniques and Ethical Influences
Antonios Gasteratos
Stavros N. Moutsis
Konstantinos A. Tsintotas
Yiannis Aloimonos
163
0
0
17 Dec 2024
DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
528
14
0
16 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
243
12
0
15 Dec 2024
Video Representation Learning with Joint-Embedding Predictive Architectures
Katrina Drozdov
Ravid Shwartz-Ziv
Yann LeCun
AI4TS
305
6
0
14 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Liang Luo
...
Liang Li
Houcheng Su
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
268
0
0
13 Dec 2024
Financial Fine-tuning a Large Time Series Model
IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), 2024
Xinghong Fu
Masanori Hirano
Kentaro Imajo
AI4TS
AIFin
310
6
0
13 Dec 2024
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Chenyu Yang
Xuan Dong
X. Zhu
Weijie Su
Jiahao Wang
H. Tian
Zheyu Chen
Wenhai Wang
Lewei Lu
Jifeng Dai
VLM
196
9
0
12 Dec 2024
Multimodal Sentiment Analysis based on Video and Audio Inputs
Antonio Fernandez
Suzan Awinat
235
1
0
12 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yanjie Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
247
1
0
10 Dec 2024
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
Shansong Liu
Atin Sakkeer Hussain
Qilong Wu
Chenshuo Sun
Ying Shan
AuLLM
227
12
0
09 Dec 2024
Streaming Detection of Queried Event Start
Neural Information Processing Systems (NeurIPS), 2024
Cristobal Eyzaguirre
Eric Tang
S. Buch
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
295
2
0
04 Dec 2024
Hybrid Spiking Neural Network -- Transformer Video Classification Model
Aaron Bateni
156
1
0
29 Nov 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2024
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
458
3
0
28 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
ACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
442
2
0
24 Nov 2024
When Spatial meets Temporal in Action Recognition
Huajun Chen
Lei Wang
Yuxiao Chen
Tom Gedeon
Piotr Koniusz
256
3
0
22 Nov 2024
Extending Video Masked Autoencoders to 128 frames
Neural Information Processing Systems (NeurIPS), 2024
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming-Hsuan Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
282
2
0
20 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
408
0
0
20 Nov 2024
Invariant Shape Representation Learning For Image Classification
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Tonmoy Hossain
Jing Ma
Jundong Li
Miaomiao Zhang
305
5
0
19 Nov 2024
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception
APSIPA Transactions on Signal and Information Processing (TASIP), 2024
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
H. Wang
293
4
0
14 Nov 2024
Previous
1
2
3
4
5
6
...
25
26
27
Next