ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15691
  4. Cited By
ViViT: A Video Vision Transformer

ViViT: A Video Vision Transformer

29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
    ViT
ArXivPDFHTML

Papers citing "ViViT: A Video Vision Transformer"

37 / 237 papers shown
Title
TransDARC: Transformer-based Driver Activity Recognition with Latent
  Space Feature Calibration
TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
ViT
27
32
0
02 Mar 2022
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
Shenggan Cheng
Xuanlei Zhao
Guangyang Lu
Bin-Rui Li
Zhongming Yu
Tian Zheng
R. Wu
Xiwen Zhang
Jian Peng
Yang You
AI4CE
9
30
0
02 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary
  Detection
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
39
14
0
01 Mar 2022
Delving Deep into One-Shot Skeleton-based Action Recognition with
  Diverse Occlusions
Delving Deep into One-Shot Skeleton-based Action Recognition with Diverse Occlusions
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
ViT
19
27
0
23 Feb 2022
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
142
360
0
24 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
13
164
0
20 Jan 2022
Continual Transformers: Redundancy-Free Attention for Online Inference
Continual Transformers: Redundancy-Free Attention for Online Inference
Lukas Hedegaard
Arian Bakhtiarnia
Alexandros Iosifidis
CLL
14
11
0
17 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
14
235
0
12 Jan 2022
RFormer: Transformer-based Generative Adversarial Network for Real
  Fundus Image Restoration on A New Clinical Benchmark
RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark
Zhuo Deng
Yuanhao Cai
Lu Chen
Zheng Gong
Qiqi Bao
Xue Yao
D. Fang
Shaochong Zhang
Lan Ma
ViT
MedIm
13
52
0
03 Jan 2022
A Simple Single-Scale Vision Transformer for Object Localization and
  Instance Segmentation
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Wuyang Chen
Xianzhi Du
Fan Yang
Lucas Beyer
Xiaohua Zhai
...
Huizhong Chen
Jing Li
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
19
20
0
17 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
25
651
0
16 Dec 2021
Co-training Transformer with Videos and Images Improves Action
  Recognition
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
20
54
0
14 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video
  Recognition
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
15
21
0
09 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
25
671
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
  for Zero-shot and Few-shot Tasks
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
27
126
0
02 Dec 2021
Self-supervised Video Transformer
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
F. Khan
Michael S. Ryoo
ViT
18
84
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
17
23
0
02 Dec 2021
Multi-domain Integrative Swin Transformer network for Sparse-View
  Tomographic Reconstruction
Multi-domain Integrative Swin Transformer network for Sparse-View Tomographic Reconstruction
Jiayi Pan
Heye Zhang
Weifei Wu
Z. Gao
Weiwen Wu
11
57
0
28 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
31
73
0
25 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal
  Difference Transformer
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip H. S. Torr
Guoying Zhao
ViT
MedIm
125
162
0
23 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
20
1,713
0
18 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Blending Anti-Aliasing into Vision Transformer
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
11
20
0
28 Oct 2021
The Efficiency Misnomer
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
17
96
0
25 Oct 2021
SCENIC: A JAX Library for Computer Vision Research and Beyond
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
41
67
0
18 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
22
16
0
12 Oct 2021
Exploring the Limits of Large Scale Pre-training
Exploring the Limits of Large Scale Pre-training
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
32
114
0
05 Oct 2021
EAN: Event Adaptive Network for Enhanced Action Recognition
EAN: Event Adaptive Network for Enhanced Action Recognition
Yuan Tian
Yichao Yan
Guangtao Zhai
G. Guo
Zhiyong Gao
27
40
0
22 Jul 2021
Is attention to bounding boxes all you need for pedestrian action
  prediction?
Is attention to bounding boxes all you need for pedestrian action prediction?
Lina Achaji
Julien Moreau
Thibault Fouqueray
François Aioun
François Charpillet
16
29
0
16 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
23
536
0
30 Jun 2021
Space-time Mixing Attention for Video Transformer
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
16
124
0
10 Jun 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for
  Signal Recognition
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
38
8
0
05 Jun 2021
Vision Transformers with Patch Diversification
Vision Transformers with Patch Diversification
Chengyue Gong
Dilin Wang
Meng Li
Vikas Chandra
Qiang Liu
ViT
32
62
0
26 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
193
375
0
01 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
267
955
0
27 Jan 2021
Previous
12345