ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15691
  4. Cited By
ViViT: A Video Vision Transformer
v1v2 (latest)

ViViT: A Video Vision Transformer

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
    ViT
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (3544★)

Papers citing "ViViT: A Video Vision Transformer"

50 / 1,311 papers shown
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
289
23
0
11 Jul 2024
Hypergraph Multi-modal Large Language Model: Exploiting EEG and
  Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video
  Understanding
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding
Minghui Wu
Chenxu Zhao
Anyang Su
Donglin Di
Tianyu Fu
...
Min He
Ya Gao
Meng Ma
Kun Yan
Ping Wang
323
6
0
11 Jul 2024
Toto: Time Series Optimized Transformer for Observability
Toto: Time Series Optimized Transformer for Observability
Ben Cohen
Emaad Khwaja
Kan Wang
Charles Masson
Elise Ramé
Youssef Doubli
Othmane Abou-Amal
AI4TS
267
15
0
10 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffMVGen
284
24
0
10 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
237
8
0
09 Jul 2024
Masked Video and Body-worn IMU Autoencoder for Egocentric Action
  Recognition
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang
Yifei Huang
Ruicong Liu
Yoichi Sato
206
17
0
09 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional
  Action Recognition
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
409
11
0
08 Jul 2024
Improving ensemble extreme precipitation forecasts using generative
  artificial intelligence
Improving ensemble extreme precipitation forecasts using generative artificial intelligence
Yingkai Sha
Ryan Sobash
David John Gagne II
258
6
0
05 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
245
12
0
03 Jul 2024
Semantically Guided Representation Learning For Action Anticipation
Semantically Guided Representation Learning For Action Anticipation
Anxhelo Diko
D. Avola
Bardh Prenkaj
Federico Fontana
Luigi Cinque
AI4TS
216
6
0
02 Jul 2024
Joint-Dataset Learning and Cross-Consistent Regularization for
  Text-to-Motion Retrieval
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
Nicola Messina
J. Sedmidubský
Fabrizio Falchi
Tomáš Rebok
226
0
0
02 Jul 2024
TransferAttn: Transferable-guided Attention Is All You Need for Video
  Domain Adaptation
TransferAttn: Transferable-guided Attention Is All You Need for Video Domain Adaptation
Andre Sacilotti
Samuel Felipe dos Santos
Andrii Zadaianchuk
Jurandy Almeida
ViT
253
0
0
01 Jul 2024
Aeroengine performance prediction using a physical-embedded data-driven
  method
Aeroengine performance prediction using a physical-embedded data-driven method
Tong Mo
Shiran Dai
An Fu
Xiaomeng Zhu
Shuxiao Li
158
2
0
29 Jun 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
277
66
0
27 Jun 2024
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context
  Parallelism
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
Diandian Gu
Peng Sun
Qinghao Hu
Ting Huang
Xun Chen
...
Jiarui Fang
Yonggang Wen
Tianwei Zhang
Xin Jin
Xuanzhe Liu
LRM
193
17
0
26 Jun 2024
Dark Transformer: A Video Transformer for Action Recognition in the Dark
Dark Transformer: A Video Transformer for Action Recognition in the Dark
Anwaar Ulhaq
ViT
230
0
0
25 Jun 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video
  Action Recognition
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
238
7
0
21 Jun 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
303
6
0
21 Jun 2024
Exploring the Impact of Hand Pose and Shadow on Hand-washing Action
  Recognition
Exploring the Impact of Hand Pose and Shadow on Hand-washing Action Recognition
Shengtai Ju
A. Reibman
CVBM
138
2
0
19 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
196
16
0
19 Jun 2024
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to
  Remote Physiological Measurement
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement
Hao Wang
Euijoon Ahn
Jinman Kim
179
0
0
19 Jun 2024
ViLCo-Bench: VIdeo Language COntinual learning Benchmark
ViLCo-Bench: VIdeo Language COntinual learning BenchmarkNeural Information Processing Systems (NeurIPS), 2024
Tianqi Tang
Shohreh Deldari
Hao Xue
Celso De Melo
Flora D. Salim
CLL
273
5
0
19 Jun 2024
Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera
  for Robot Guidance
Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera for Robot Guidance
Eran Bamani Beeri
Eden Nissinman
A. Sintov
167
0
0
18 Jun 2024
LieRE: Lie Rotational Positional Encodings
LieRE: Lie Rotational Positional Encodings
Sophie Ostmeier
Brian Axelrod
Michael E. Moseley
Akshay S. Chaudhari
Akshay Chaudhari
C. Langlotz
354
1
0
14 Jun 2024
Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting
  Process: Methodology and Benchmark
Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark
Gaochang Wu
Yapeng Zhang
Lan Deng
Jingxin Zhang
Tianyou Chai
209
20
0
13 Jun 2024
Advanced Multimodal Deep Learning Architecture for Image-Text Matching
Advanced Multimodal Deep Learning Architecture for Image-Text Matching
Jinyin Wang
Haijing Zhang
Yihao Zhong
Yingbin Liang
Rongwei Ji
Yiru Cang
347
28
0
13 Jun 2024
Skim then Focus: Integrating Contextual and Fine-grained Views for
  Repetitive Action Counting
Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
Zhengqi Zhao
Xiaohu Huang
Hao Zhou
Kun Yao
Errui Ding
Jingdong Wang
Xinggang Wang
Wenyu Liu
Bin Feng
172
2
0
13 Jun 2024
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual
  Tracking
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Xiangyang Yang
Dan Zeng
Xucheng Wang
You Wu
Hengzhou Ye
Qijun Zhao
Shuiwang Li
301
17
0
12 Jun 2024
Image and Video Tokenization with Binary Spherical Quantization
Image and Video Tokenization with Binary Spherical Quantization
Yue Zhao
Yuanjun Xiong
Philipp Krahenbuhl
263
59
0
11 Jun 2024
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video
  Prediction
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing
Jingdong Sun
Zejia Weng
Zuxuan Wu
Yu-Gang Jiang
VGen
295
23
0
10 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
583
26
1
09 Jun 2024
MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome
MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner SyndromeBiomedical Signal Processing and Control (BSPC), 2024
Yixin Huang
Yiqi Jin
Ke Tao
Kaijian Xia
Jianfeng Gu
Lei Yu
Haojie Li
Lan Du
C. Chen
344
0
0
07 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Xu Tan
VGen
646
31
0
06 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
333
2
0
05 Jun 2024
Population Transformer: Learning Population-level Representations of Neural Activity
Population Transformer: Learning Population-level Representations of Neural Activity
Geeling Chau
Christopher Wang
Sabera Talukder
Vighnesh Subramaniam
Saraswati Soedarmadji
Yisong Yue
Boris Katz
Andrei Barbu
MedIm
412
21
0
05 Jun 2024
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human
  Image Animation
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
Xiang Wang
Shiwei Zhang
Changxin Gao
Jiayu Wang
Xiaoqiang Zhou
Yingya Zhang
Luxin Yan
Nong Sang
VGen
322
76
0
03 Jun 2024
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a
  Hybrid Model
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model
Khaled Alomar
Halil Ibrahim Aysel
Xiaohao Cai
MedImViT
293
26
0
02 Jun 2024
Exploiting Frequency Correlation for Hyperspectral Image Reconstruction
Exploiting Frequency Correlation for Hyperspectral Image Reconstruction
Muge Yan
Lizhi Wang
Lin Zhu
Hua Huang
437
2
0
02 Jun 2024
DroneVis: Versatile Computer Vision Library for Drones
DroneVis: Versatile Computer Vision Library for Drones
Ahmed Heakl
F. Youssef
Victor Parque
Walid Gomaa
AI4TS
221
2
0
01 Jun 2024
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any
  Resolution
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu
230
4
0
28 May 2024
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse
  PreTrained Models from the Wild
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan
Hongbo Liu
Mading Li
Muyi Sun
Ming Sun
Jiachao Gong
Jinhua Hao
Chao Zhou
Yansong Tang
ViT
173
14
0
28 May 2024
Hierarchical Action Recognition: A Contrastive Video-Language Approach
  with Hierarchical Interactions
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions
Rui Zhang
Shuailong Li
Junxiao Xue
Feng Lin
Qing Zhang
Xiao Ma
Xiaoran Yan
271
1
0
28 May 2024
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to
  Biological Motion Perception
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception
Shuangpeng Han
Ziyu Wang
Mengmi Zhang
272
1
0
26 May 2024
Planted: a dataset for planted forest identification from
  multi-satellite time series
Planted: a dataset for planted forest identification from multi-satellite time series
L. M. Pazos-Outón
Cristina Nader Vasconcelos
Anton Raichuk
Anurag Arnab
Dan Morris
Maxim Neumann
200
8
0
24 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Yaoyao Liu
Cihang Xie
AI4TSVGenSSL
216
2
0
24 May 2024
Enhanced Spatiotemporal Prediction Using Physical-guided And
  Frequency-enhanced Recurrent Neural Networks
Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks
Xuanle Zhao
Yue Sun
Tielin Zhang
Bo Xu
AI4CE
219
3
0
23 May 2024
Attending to Topological Spaces: The Cellular Transformer
Attending to Topological Spaces: The Cellular Transformer
Rubén Ballester
Pablo Hernández-García
Johan Mathe
Claudio Battiloro
Nina Miolane
Tolga Birdal
Carles Casacuberta
Sergio Escalera
Pavlo Vasylenko
323
6
0
23 May 2024
Scaling-laws for Large Time-series Models
Scaling-laws for Large Time-series Models
Thomas D. P. Edwards
James Alvey
Justin Alsing
Nam H. Nguyen
Benjamin Dan Wandelt
AI4TSAIFin
258
16
0
22 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A
  Survey
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
347
30
0
22 May 2024
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
Zhifan Wan
Jie Zhang
Chang-bo Li
Shiguang Shan
243
0
0
21 May 2024
Previous
123...789...252627
Next