ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15691
  4. Cited By
ViViT: A Video Vision Transformer

ViViT: A Video Vision Transformer

29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
    ViT
ArXivPDFHTML

Papers citing "ViViT: A Video Vision Transformer"

50 / 237 papers shown
Title
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
33
8
0
18 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
28
3
0
17 Jul 2023
Transformer-based end-to-end classification of variable-length
  volumetric data
Transformer-based end-to-end classification of variable-length volumetric data
Marzieh Oghbaie
Teresa Araújo
T. Emre
U. Schmidt-Erfurth
Hrvoje Bogunović
ViT
MedIm
11
3
0
13 Jul 2023
SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder
  and Transformer Network
SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network
Hamza Bouzid
Lahoucine Ballihi
ViT
3DH
13
2
0
30 Jun 2023
How can objects help action recognition?
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
30
14
0
20 Jun 2023
Towards Consistent Video Editing with Text-to-Image Diffusion Models
Towards Consistent Video Editing with Text-to-Image Diffusion Models
Zicheng Zhang
Bonan Li
Xuecheng Nie
Congying Han
Tiande Guo
Luoqi Liu
DiffM
10
24
0
27 May 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
46
4
0
25 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
13
113
0
18 May 2023
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
André O. Françani
Marcos R. O. A. Máximo
19
8
0
10 May 2023
Video-Specific Query-Key Attention Modeling for Weakly-Supervised
  Temporal Action Localization
Video-Specific Query-Key Attention Modeling for Weakly-Supervised Temporal Action Localization
Xijun Wang
Aggelos K. Katsaggelos
20
0
0
07 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
58
6
0
05 May 2023
Improve Video Representation with Temporal Adversarial Augmentation
Improve Video Representation with Temporal Adversarial Augmentation
Jinhao Duan
Quanfu Fan
Hao-Ran Cheng
Xiaoshuang Shi
Kaidi Xu
AAML
AI4TS
ViT
13
2
0
28 Apr 2023
Efficient Video Action Detection with Token Dropout and Context
  Refinement
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
21
14
0
17 Apr 2023
Fairness in Visual Clustering: A Novel Transformer Clustering Approach
Fairness in Visual Clustering: A Novel Transformer Clustering Approach
Xuan-Bac Nguyen
C. Duong
Marios Savvides
Kaushik Roy
Hugh Churchill
Khoa Luu
18
9
0
14 Apr 2023
ENTL: Embodied Navigation Trajectory Learner
ENTL: Embodied Navigation Trajectory Learner
Klemen Kotar
Aaron Walsman
Roozbeh Mottaghi
8
6
0
05 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
19
30
0
03 Apr 2023
MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot
  Action Recognition
MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
19
38
0
03 Apr 2023
Unbiased Scene Graph Generation in Videos
Unbiased Scene Graph Generation in Videos
Sayak Nag
Kyle Min
Subarna Tripathi
A. Roy-Chowdhury
19
28
0
03 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
8
4
0
01 Apr 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Wen Wang
Yan Jiang
K. Xie
Zide Liu
Hao Chen
Yue Cao
Xinlong Wang
Chunhua Shen
DiffM
VGen
13
112
0
30 Mar 2023
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction
  with Run Length Encoding
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding
Jae Joong Lee
Bedrich Benes
ViT
14
0
0
28 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
19
15
0
28 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of
  Vision-and-Language Tasks Using Knowledge Distillation
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
CLL
19
11
0
25 Mar 2023
Multi-view knowledge distillation transformer for human action
  recognition
Multi-view knowledge distillation transformer for human action recognition
Yi Lin
Vincent S. Tseng
ViT
10
1
0
25 Mar 2023
Learning Spatial-Temporal Implicit Neural Representations for
  Event-Guided Video Super-Resolution
Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
Yunfan Lu
Zipeng Wang
Minjie Liu
Hongjian Wang
Lin Wang
SupR
16
30
0
24 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual
  Transformers
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
17
1
0
21 Mar 2023
MECPformer: Multi-estimations Complementary Patch with CNN-Transformers
  for Weakly Supervised Semantic Segmentation
MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation
Chunmeng Liu
Guang-pu Li
Yao Shen
Ruiqi Wang
ViT
17
7
0
19 Mar 2023
Confidence Attention and Generalization Enhanced Distillation for
  Continuous Video Domain Adaptation
Confidence Attention and Generalization Enhanced Distillation for Continuous Video Domain Adaptation
Xiyu Wang
Yuecong Xu
Jianfei Yang
Xiaoli Li
Zhenghua Chen
TTA
16
0
0
18 Mar 2023
Capturing the motion of every joint: 3D human pose and shape estimation
  with independent tokens
Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens
Sen Yang
Wen Heng
Gang Liu
Guozhong Luo
Wankou Yang
Gang Yu
3DH
ViT
18
11
0
01 Mar 2023
LIT-Former: Linking In-plane and Through-plane Transformers for
  Simultaneous CT Image Denoising and Deblurring
LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring
Zhihao Chen
Chuang Niu
Qi Gao
Ge Wang
Hongming Shan
MedIm
ViT
3DV
23
19
0
21 Feb 2023
Transformadores: Fundamentos teoricos y Aplicaciones
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
60
0
0
18 Feb 2023
One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data
One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data
Simone Luetto
Fabrizio Garuti
E. Sangineto
L. Forni
Rita Cucchiara
LMTD
AI4TS
79
10
0
13 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
29
562
0
10 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
ViT
24
143
0
06 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text
  Retrieval
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Yizhen Chen
Jie Wang
Lijian Lin
Zhongang Qi
Jin Ma
Ying Shan
VLM
11
18
0
30 Jan 2023
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Cheng Lu
Xiaojie Jin
Zhicheng Huang
Qibin Hou
Mingg-Ming Cheng
Jiashi Feng
22
8
0
15 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action
  Recognition
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Bernard Ghanem
AAML
16
8
0
03 Jan 2023
A Survey on Human Action Recognition
A Survey on Human Action Recognition
Zhou Shuchang
16
0
0
20 Dec 2022
SADM: Sequence-Aware Diffusion Model for Longitudinal Medical Image
  Generation
SADM: Sequence-Aware Diffusion Model for Longitudinal Medical Image Generation
Jee Seok Yoon
Chenghao Zhang
Heung-Il Suk
Jia Guo
Xiaoxia Li
DiffM
MedIm
9
36
0
16 Dec 2022
Egocentric Video Task Translation
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
16
13
0
13 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
F. Khan
CLIP
VLM
11
148
0
06 Dec 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal
  Action Localization
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Bernard Ghanem
10
17
0
25 Nov 2022
Towards Good Practices for Missing Modality Robust Action Recognition
Towards Good Practices for Missing Modality Robust Action Recognition
Sangmin Woo
Sumin Lee
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
22
42
0
25 Nov 2022
Event Transformer+. A multi-purpose solution for efficient event data
  processing
Event Transformer+. A multi-purpose solution for efficient event data processing
Alberto Sabater
Luis Montesano
Ana C. Murillo
ViT
16
8
0
22 Nov 2022
Token Turing Machines
Token Turing Machines
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
27
21
0
16 Nov 2022
Attention-based Neural Cellular Automata
Attention-based Neural Cellular Automata
Mattie Tesfaldet
Derek Nowrouzezahrai
C. Pal
ViT
10
16
0
02 Nov 2022
Grafting Vision Transformers
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
13
2
0
28 Oct 2022
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory
  Forecasting from Multimodal Data
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data
Huy Hoang Nguyen
Matthew B. Blaschko
S. Saarakkala
A. Tiulpin
MedIm
AI4CE
35
15
0
25 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online
  Action Prediction
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
20
3
0
24 Oct 2022
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal
  Modeling
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Dongsheng Chen
Chaofan Tao
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
VLM
15
18
0
21 Oct 2022
Previous
12345
Next