ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15691
  4. Cited By
ViViT: A Video Vision Transformer
v1v2 (latest)

ViViT: A Video Vision Transformer

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
    ViT
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (3544★)

Papers citing "ViViT: A Video Vision Transformer"

50 / 1,308 papers shown
Title
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative
  Study of ChatGPT, AI Models and Human Perception
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human PerceptionAPSIPA Transactions on Signal and Information Processing (TASIP), 2024
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
H. Wang
293
4
0
14 Nov 2024
Pay Attention to the Keys: Visual Piano Transcription Using Transformers
Pay Attention to the Keys: Visual Piano Transcription Using TransformersInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Uros Zivanovic
Ivan Pilkov
Carlos Eduardo Cancino-Chacón
ViT
156
0
0
13 Nov 2024
Balancing Multimodal Training Through Game-Theoretic Regularization
Balancing Multimodal Training Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
364
3
0
11 Nov 2024
CityGuessr: City-Level Video Geo-Localization on a Global Scale
CityGuessr: City-Level Video Geo-Localization on a Global ScaleEuropean Conference on Computer Vision (ECCV), 2024
P. Kulkarni
Gaurav Kumar Nayak
Mubarak Shah
ViTAI4TS
169
9
0
10 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length TokenizationNeural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
225
25
0
07 Nov 2024
DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for
  Audio-Driven Dance Motion Reconstruction
DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction
Li Zhao
Zhengmin Lu
VGen
177
0
0
07 Nov 2024
Can Language Models Enable In-Context Database?
Can Language Models Enable In-Context Database?
Yu Pan
Hongfeng Yu
Tianjiao Zhao
Jianxin Sun
KELMSyDaLMTD
126
0
0
04 Nov 2024
Visual Fourier Prompt Tuning
Visual Fourier Prompt TuningNeural Information Processing Systems (NeurIPS), 2024
Runjia Zeng
Cheng Han
Qifan Wang
Chunshu Wu
Tong Geng
Lifu Huang
Ying Nian Wu
Dongfang Liu
VPVLMVLM
396
26
0
02 Nov 2024
STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting
  Transformer-based Video Models
STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models
Zerui Wang
Yan Liu
280
6
0
01 Nov 2024
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption
Ruixun Liu
Kaiyu Li
Jiayi Song
Dongwei Sun
Xiangyong Cao
VGen
176
2
0
31 Oct 2024
DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake
  Detection
DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake DetectionIEEE transactions on multimedia (IEEE TMM), 2024
Fan Nie
Jiangqun Ni
Jian Zhang
Bin Zhang
Weizhe Zhang
DiffM
231
6
0
31 Oct 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video DetectionNeural Information Processing Systems (NeurIPS), 2024
Xiufeng Song
Xiao Guo
Junxuan Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
VGenDiffM
608
28
0
31 Oct 2024
A Theoretical Perspective for Speculative Decoding Algorithm
A Theoretical Perspective for Speculative Decoding AlgorithmNeural Information Processing Systems (NeurIPS), 2024
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
192
20
0
30 Oct 2024
EEG-based Multimodal Representation Learning for Emotion Recognition
EEG-based Multimodal Representation Learning for Emotion RecognitionBalkan Conference in Informatics (BI), 2024
Kang Yin
Hye-Bin Shin
Dan Li
Seong-Whan Lee
177
8
0
29 Oct 2024
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual ContextComputer Vision and Image Understanding (CVIU), 2024
Manuel Benavent-Lledo
David Mulero-Pérez
David Ortiz-Perez
José García Rodríguez
Antonis Argyros
288
3
0
28 Oct 2024
Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving
Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving
Jiangming Wang
Xiao Yang
Zhenyu Wang
Ximeng Wei
Ange Wang
Dengbo He
Kaishun Wu
261
8
0
28 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
211
3
0
24 Oct 2024
Are Visual-Language Models Effective in Action Recognition? A
  Comparative Study
Are Visual-Language Models Effective in Action Recognition? A Comparative Study
Mahmoud Ali
Di Yang
François Brémond
VLM
239
3
0
22 Oct 2024
Masked Differential Privacy
Masked Differential Privacy
David Schneider
Sina Sajadmanesh
Vikash Sehwag
Saquib Sarfraz
Rainer Stiefelhagen
Lingjuan Lyu
Vivek Sharma
210
0
0
22 Oct 2024
Multimodal Learning for Embryo Viability Prediction in Clinical IVF
Multimodal Learning for Embryo Viability Prediction in Clinical IVFInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Junsik Kim
Zhiyi Shi
Davin Jeong
Johannes Knittel
H. Yang
...
Wanhua Li
Yicong Li
D. Ben-Yosef
D. Needleman
Hanspeter Pfister
215
3
0
21 Oct 2024
SEA: State-Exchange Attention for High-Fidelity Physics Based
  Transformers
SEA: State-Exchange Attention for High-Fidelity Physics Based TransformersNeural Information Processing Systems (NeurIPS), 2024
Parsa Esmati
Amirhossein Dadashzadeh
Vahid Goodarzi
Nicolas Larrosa
Nicolo Grilli
281
0
0
20 Oct 2024
ContextDet: Temporal Action Detection with Adaptive Context Aggregation
ContextDet: Temporal Action Detection with Adaptive Context Aggregation
Ning Wang
Yun Xiao
Xiaopeng Peng
Xiaojun Chang
Xuanhong Wang
Dingyi Fang
323
4
0
20 Oct 2024
Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in
  Traffic Monitoring
Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic MonitoringIEEE Transactions on Artificial Intelligence (IEEE TAI), 2024
Kristina Telegraph
Christos Kyrkou
ObjD
225
1
0
17 Oct 2024
Transformer-Based Approaches for Sensor-Based Human Activity
  Recognition: Opportunities and Challenges
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
210
7
0
17 Oct 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic
  Synchronization
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
223
14
0
16 Oct 2024
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World
  Model Disentanglement
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model DisentanglementNeural Information Processing Systems (NeurIPS), 2024
Zhi Wang
Li Zhang
Wenhao Wu
Yuanheng Zhu
Dongbin Zhao
C. L. Philip Chen
OffRL
216
15
0
15 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language
  to Video Knowledge Transfer
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge TransferNeural Information Processing Systems (NeurIPS), 2024
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
226
3
0
14 Oct 2024
ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person
  Re-identification
ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification
Chen Mao
Chong Tan
Jingqi Hu
Min Zheng
164
2
0
13 Oct 2024
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond
LoLI-Street: Benchmarking Low-Light Image Enhancement and BeyondAsian Conference on Computer Vision (ACCV), 2024
Md Tanvir Islam
Inzamamul Alam
Simon Woo
Saeed Anwar
IK Hyun Lee
Khan Muhammad
ViT
211
11
0
13 Oct 2024
Movie Trailer Genre Classification Using Multimodal Pretrained Features
Movie Trailer Genre Classification Using Multimodal Pretrained FeaturesExpert systems with applications (ESWA), 2024
Serkan Sulun
Paula Viana
M. Davies
CLIP
179
8
0
11 Oct 2024
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete DiffusionInternational Conference on Learning Representations (ICLR), 2024
Onkar Susladkar
Jishu Sen Gupta
Chirag Sehgal
Sparsh Mittal
Rekha Singhal
DiffMVGen
302
1
0
10 Oct 2024
QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space
  Model
QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space ModelNeural Information Processing Systems (NeurIPS), 2024
Fei Xie
Weijia Zhang
Zhongdao Wang
Chao Ma
Mamba
260
18
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You ThinkInternational Conference on Learning Representations (ICLR), 2024
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
618
276
0
09 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Enhancing Temporal Modeling of Video LLMs via Time GatingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
Michael R. Lyu
Liwei Wang
VLM
172
6
0
08 Oct 2024
Linear Transformer Topological Masking with Graph Random Features
Linear Transformer Topological Masking with Graph Random FeaturesInternational Conference on Learning Representations (ICLR), 2024
Isaac Reid
Kumar Avinava Dubey
Deepali Jain
Will Whitney
Amr Ahmed
...
Connor Schenck
Richard E. Turner
René Wagner
Adrian Weller
Krzysztof Choromanski
224
4
0
04 Oct 2024
Action Selection Learning for Multi-label Multi-view Action Recognition
Action Selection Learning for Multi-label Multi-view Action RecognitionACM Multimedia Asia (MMAsia), 2024
Trung Thanh Nguyen
Yasutomo Kawanishi
Takahiro Komamizu
Ichiro Ide
236
7
0
04 Oct 2024
ECHOPulse: ECG controlled echocardio-grams video generation
ECHOPulse: ECG controlled echocardio-grams video generationInternational Conference on Learning Representations (ICLR), 2024
Yiwei Li
Sekeun Kim
Zihao Wu
Hanqi Jiang
Yi Pan
...
Sifan Song
Yucheng Shi
Tianming Liu
Quanzheng Li
Xiang Li
VGen
167
4
0
04 Oct 2024
TikGuard: A Deep Learning Transformer-Based Solution for Detecting
  Unsuitable TikTok Content for Kids
TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for KidsNovel Intelligent and Leading Emerging Sciences Conference (NILES), 2024
Mazen Balat
Mahmoud Essam Gabr
Hend Bakr
A. Zaky
66
6
0
01 Oct 2024
CycleCrash: A Dataset of Bicycle Collision Videos for Collision
  Prediction and Analysis
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and AnalysisIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Nishq Poorav Desai
Ali Etemad
Michael A. Greenspan
293
4
0
30 Sep 2024
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid
  Robust and Fair Featuring in Face Analysis
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face AnalysisInternational Conference on Pattern Recognition (ICPR), 2024
Shukesh Reddy
Nishit Poddar
Srijan Das
Abhijit Das
CVBM
271
1
0
29 Sep 2024
Spiking Transformer with Spatial-Temporal Attention
Spiking Transformer with Spatial-Temporal AttentionComputer Vision and Pattern Recognition (CVPR), 2024
Donghyun Lee
Yuhang Li
Youngeun Kim
Shiting Xiao
Priyadarshini Panda
361
14
0
29 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient
  Object-Aware Pretraining
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
240
0
0
26 Sep 2024
MASSFormer: Mobility-Aware Spectrum Sensing using Transformer-Driven
  Tiered Structure
MASSFormer: Mobility-Aware Spectrum Sensing using Transformer-Driven Tiered StructureIEEE Communications Letters (IEEE Commun. Lett.), 2024
Dimpal Janu
Sandeep Mandia
Kuldeep Singh
Sandeep Kumar
183
0
0
26 Sep 2024
Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data
Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data
Lukas Heine
Fabian Horst
Jana Fragemann
Gijs Luijten
M. Balzer
Jan Egger
F. Bahnsen
M. Sarfraz
Jens Kleesiek
242
0
0
25 Sep 2024
Multi-Grid Graph Neural Networks with Self-Attention for Computational
  Mechanics
Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics
Paul Garnier
J. Viquerat
E. Hachem
AI4CE
133
5
0
18 Sep 2024
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in
  Autonomous Driving
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous DrivingIEEE International Conference on Robotics and Automation (ICRA), 2024
Songning Lai
Tianlang Xue
Songning Lai
Lijie Hu
Jiemin Wu
Ninghui Feng
Runwei Guan
Haicheng Liao
Zhenning Li
Yutao Yue
177
8
0
16 Sep 2024
A Survey of Foundation Models for Music Understanding
A Survey of Foundation Models for Music Understanding
Wenjun Li
Ying Cai
Ziyang Wu
Wenyi Zhang
Yifan Chen
...
Junwei Han
Bao Ge
Tianming Liu
Lin Gan
Tuo Zhang
230
3
0
15 Sep 2024
Automatic Scene Generation: State-of-the-Art Techniques, Models,
  Datasets, Challenges, and Future Prospects
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future ProspectsIEEE Access (IEEE Access), 2024
Awal Ahmed Fime
Saifuddin Mahmud
Arpita Das
Md. Sunzidul Islam
Hong-Hoon Kim
VGen3DV
235
2
0
14 Sep 2024
TabMixer: Noninvasive Estimation of the Mean Pulmonary Artery Pressure
  via Imaging and Tabular Data Mixing
TabMixer: Noninvasive Estimation of the Mean Pulmonary Artery Pressure via Imaging and Tabular Data MixingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Michal K. Grzeszczyk
Przemysław Korzeniowski
S. Alabed
Andrew J Swift
Tomasz Trzciñski
Arkadiusz Sitek
114
2
0
11 Sep 2024
Data Collection-free Masked Video Modeling
Data Collection-free Masked Video ModelingEuropean Conference on Computer Vision (ECCV), 2024
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
166
1
0
10 Sep 2024
Previous
123...567...252627
Next