Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.15691
Cited By
ViViT: A Video Vision Transformer
29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViViT: A Video Vision Transformer"
50 / 237 papers shown
Title
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
11
0
10 Jul 2024
Improving ensemble extreme precipitation forecasts using generative artificial intelligence
Yingkai Sha
R. Sobash
David John Gagne II
18
0
0
05 Jul 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
68
0
0
21 Jun 2024
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Xiangyang Yang
Dan Zeng
Xucheng Wang
You Wu
Hengzhou Ye
Qijun Zhao
Shuiwang Li
51
3
0
12 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
97
16
0
06 Jun 2024
LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing
Ana Ezquerro
David Vilares
21
1
0
10 May 2024
Compression-Realized Deep Structural Network for Video Quality Enhancement
Hanchi Sun
Xiaohong Liu
Xinyang Jiang
Yifei Shen
Dongsheng Li
Xiongkuo Min
Guangtao Zhai
25
1
0
10 May 2024
Deep Learning for Melt Pool Depth Contour Prediction From Surface Thermal Images via Vision Transformers
Francis Ogoke
P. Pak
Alexander J. Myers
Guadalupe Quirarte
Jack L. Beuth
Jonathan A. Malen
A. Farimani
AI4CE
ViT
14
2
0
26 Apr 2024
MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information
Jiajun Liang
Baoquan Zhang
Yunming Ye
Xutao Li
Chuyao Luo
Xukai Fu
24
0
0
26 Apr 2024
ThermoPore: Predicting Part Porosity Based on Thermal Images Using Deep Learning
P. Pak
Francis Ogoke
Andrew Polonsky
Anthony Garland
D. Bolintineanu
Dan R. Moser
Michael J. Heiden
A. Farimani
14
4
0
23 Apr 2024
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing
Yuang Liu
Zhiheng Qiu
Xiaokai Qin
ViT
23
0
0
20 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
36
7
0
28 Mar 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
27
2
0
28 Mar 2024
Edit3K: Universal Representation Learning for Video Editing Components
Xin Gu
Libo Zhang
Fan Chen
Longyin Wen
Yufei Wang
Tiejian Luo
Sijie Zhu
30
4
0
24 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
Temporal Enhanced Floating Car Observers
Jeremias Gerner
Klaus Bogenberger
Stefanie Schmidtner
14
1
0
06 Mar 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
28
1
0
23 Feb 2024
Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning
Amir Ziai
Aneesh Vartakavi
VLM
VGen
19
0
0
09 Feb 2024
Let Your Graph Do the Talking: Encoding Structured Data for LLMs
Bryan Perozzi
Bahare Fatemi
Dustin Zelle
Anton Tsitsulin
Mehran Kazemi
Rami Al-Rfou
Jonathan J. Halcrow
GNN
24
55
0
08 Feb 2024
Deepfake Detection and the Impact of Limited Computing Capabilities
Paloma Cantero-Arjona
Alfonso Sánchez-Macián
26
2
0
08 Feb 2024
Motion Consistency Loss for Monocular Visual Odometry with Attention-Based Deep Learning
André O. Françani
Marcos R. O. A. Máximo
17
0
0
19 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
39
35
0
16 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie M. Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
52
0
0
15 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Z. Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
123
227
0
05 Jan 2024
Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features
Yuming Huang
Yingpin Chen
Changhui Wu
Hanrong Xie
Binhui Song
Hui Wang
SupR
ViT
19
0
0
30 Dec 2023
A brief introduction to a framework named Multilevel Guidance-Exploration Network
Guoqing Yang
Zhiming Luo
Jianzhe Gao
Yingxin Lai
Kun Yang
Yifan He
Shaozi Li
3DH
16
0
0
07 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
32
4
0
05 Dec 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Zhen Xing
Qi Dai
Zihao Zhang
Hui Zhang
Hang-Rui Hu
Zuxuan Wu
Yu-Gang Jiang
VGen
30
17
0
30 Nov 2023
LEAP: LLM-Generation of Egocentric Action Programs
Eadom Dessalene
Michael Maynord
Cornelia Fermuller
Yiannis Aloimonos
13
3
0
29 Nov 2023
Echocardiogram Foundation Model -- Application 1: Estimating Ejection Fraction
Adil Dahlan
C. Zakka
Abhinav Kumar
Laura Tang
R. Shad
R. Fong
W. Hiesinger
6
2
0
21 Nov 2023
Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition
T. Fujii
Hayato Nakagawa
T. Takeshima
Y. Yumura
T. Hamagami
25
3
0
10 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
16
64
0
07 Nov 2023
S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Taiki Sugiura
Toru Tamaki
AI4TS
22
2
0
23 Oct 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
33
5
0
19 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
31
116
0
16 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
29
3
0
10 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
23
1
0
20 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
15
9
0
05 Sep 2023
Detection of Mild Cognitive Impairment Using Facial Features in Video Conversations
Muath Alsuhaibani
H. H. Dodge
Mohammad H. Mahoor
CVBM
8
3
0
29 Aug 2023
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
25
0
0
24 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
11
3
0
23 Aug 2023
SegRNN: Segment Recurrent Neural Network for Long-Term Time Series Forecasting
Shengsheng Lin
Weiwei Lin
Wentai Wu
Feiyu Zhao
Ruichao Mo
Haotong Zhang
AI4TS
18
48
0
22 Aug 2023
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
17
53
0
21 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
22
29
0
21 Aug 2023
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Yihua Zhang
Ruisi Cai
Tianlong Chen
Guanhua Zhang
Huan Zhang
Pin-Yu Chen
Shiyu Chang
Zhangyang Wang
Sijia Liu
MoE
AAML
OOD
17
16
0
19 Aug 2023
Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review
Nicole L. Robinson
Brendan Tidd
Dylan Campbell
Dana Kulić
Peter Corke
30
54
0
28 Jul 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
17
7
0
27 Jul 2023
GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos
Nisarg A. Shah
S. Sikder
S. Vedula
Vishal M. Patel
ViT
MedIm
12
7
0
20 Jul 2023
Previous
1
2
3
4
5
Next