ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15691
  4. Cited By
ViViT: A Video Vision Transformer

ViViT: A Video Vision Transformer

29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
    ViT
ArXivPDFHTML

Papers citing "ViViT: A Video Vision Transformer"

50 / 237 papers shown
Title
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Kwan-Yee Lin
Stella X.Yu
19
0
0
09 May 2025
Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach
Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach
Tim Schneider
Cristiana de Farias
Roberto Calandra
L. Chen
Jan Peters
36
0
0
09 May 2025
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Junhe Zhang
Wanli Ni
Pengwei Wang
Dongyu Wang
9
0
0
06 May 2025
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Hao Xu
Arbind Agrahari Baniya
Sam Well
Mohamed Reda Bouadjenek
Richard Dazeley
S. Aryal
AI4TS
22
0
0
06 May 2025
DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor
DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor
Wei-Ting Chen
Yu-Jiet Vong
Yi-Tsung Lee
Sy-Yen Kuo
Qiang Gao
Sizhuo Ma
Jian Wang
61
0
0
06 May 2025
Artificial Behavior Intelligence: Technology, Challenges, and Future Directions
Artificial Behavior Intelligence: Technology, Challenges, and Future Directions
Kanghyun Jo
Jehwan Choi
Kwanho Kim
Seongmin Kim
Duy-Linh Nguyen
Xuan-Thuy Vo
Adri Priadana
Tien-Dat Tran
AI4CE
32
0
0
06 May 2025
Token Coordinated Prompt Attention is Needed for Visual Prompting
Token Coordinated Prompt Attention is Needed for Visual Prompting
Zichen Liu
Xu Zou
Gang Hua
Jiahuan Zhou
26
0
0
05 May 2025
A Deep Learning approach for Depressive Symptoms assessment in Parkinson's disease patients using facial videos
A Deep Learning approach for Depressive Symptoms assessment in Parkinson's disease patients using facial videos
Ioannis Kyprakis
Vasileios Skaramagkas
Iro Boura
Georgios Karamanis
D. Fotiadis
Zinovia Kefalopoulou
Cleanthe Spanaki
M. Tsiknakis
29
0
0
05 May 2025
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Tamim Ahmed
Thanassis Rikakis
15
0
0
03 May 2025
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao-quan Song
Jiahao Zhang
Jiale Zhao
EGVM
VGen
PINN
75
1
0
01 May 2025
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Hugo Georgenthum
Cristian Cosentino
Fabrizio Marozzo
Pietro Liò
MedIm
54
0
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
76
0
0
28 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
70
0
0
28 Apr 2025
Video CLIP Model for Multi-View Echocardiography Interpretation
Video CLIP Model for Multi-View Echocardiography Interpretation
Ryo Takizawa
Satoshi Kodera
Tempei Kabayama
Ryo Matsuoka
Yuta Ando
Yuto Nakamura
Haruki Settai
Norihiko Takeda
27
0
0
26 Apr 2025
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Guodong Shen
Yuqi Ouyang
Junru Lu
Yixuan Yang
Victor Sanchez
23
1
0
20 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
57
0
0
20 Apr 2025
Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition
Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition
Alexander Brettmann
Jakob Grävinghoff
Marlene Rüschoff
Marie Westhues
SLR
46
0
0
10 Apr 2025
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
Trung Thanh Nguyen
Yasutomo Kawanishi
Vijay John
Takahiro Komamizu
Ichiro Ide
41
0
0
03 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
32
0
0
02 Apr 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
121
0
0
26 Mar 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Q. Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Enhong Chen
3DV
61
3
0
11 Mar 2025
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang
Kairong Yu
Xian Zhong
Hongwei Wang
Qi Xu
Qiang Zhang
83
0
0
04 Mar 2025
Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking
Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking
Shaheer Mohamed
Tharindu Fernando
S. Sridharan
Peyman Moghadam
Clinton Fookes
ViT
61
1
0
26 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
86
17
0
21 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
38
0
0
11 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
68
1
0
28 Jan 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Can masking background and object reduce static bias for zero-shot action recognition?
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
35
0
0
22 Jan 2025
Slot-BERT: Self-supervised Object Discovery in Surgical Video
Slot-BERT: Self-supervised Object Discovery in Surgical Video
Guiqiu Liao
M. Jogan
Marcel Hussing
Kenta Nakahashi
Kazuhiro Yasufuku
Amin Madani
Eric Eaton
Daniel A. Hashimoto
43
0
0
21 Jan 2025
Counteracting temporal attacks in Video Copy Detection
Counteracting temporal attacks in Video Copy Detection
Katarzyna Fojcik
Piotr Syga
AAML
33
0
0
19 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
95
16
0
17 Jan 2025
DynST: Dynamic Sparse Training for Resource-Constrained Spatio-Temporal Forecasting
DynST: Dynamic Sparse Training for Resource-Constrained Spatio-Temporal Forecasting
Hao Wu
Haomin Wen
Guibin Zhang
Yutong Xia
Kai Wang
Yuxuan Liang
Yu Zheng
Kun Wang
60
2
0
17 Jan 2025
Measuring Error Alignment for Decision-Making Systems
Measuring Error Alignment for Decision-Making Systems
Binxia Xu
Antonis Bikakis
Daniel Onah
A. Vlachidis
Luke Dickens
34
0
0
03 Jan 2025
Predicting Chess Puzzle Difficulty with Transformers
Predicting Chess Puzzle Difficulty with Transformers
Szymon Miłosz
Paweł Kapusta
33
0
0
31 Dec 2024
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Yunshan Zhong
Yuyao Zhou
Yuxin Zhang
Shen Li
Yong Li
Fei Chao
Zhanpeng Zeng
Rongrong Ji
MQ
87
0
0
31 Dec 2024
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
56
23
0
31 Dec 2024
DRDM: A Disentangled Representations Diffusion Model for Synthesizing
  Realistic Person Images
DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images
Enbo Huang
Yuan Zhang
Faliang Huang
Guangyu Zhang
Y. Liu
DiffM
37
0
0
25 Dec 2024
Hierarchical Vector Quantization for Unsupervised Action Segmentation
Hierarchical Vector Quantization for Unsupervised Action Segmentation
Federico Spurio
Emad Bahrami
Gianpiero Francesca
Juergen Gall
39
0
0
23 Dec 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
72
1
0
28 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
73
0
0
20 Nov 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
67
8
0
31 Oct 2024
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World
  Model Disentanglement
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
Zhi Wang
Li Lyna Zhang
Wenhao Wu
Yuanheng Zhu
Dongbin Zhao
C. L. Philip Chen
OffRL
30
6
0
15 Oct 2024
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Onkar Susladkar
Jishu Sen Gupta
Chirag Sehgal
Sparsh Mittal
Rekha Singhal
DiffM
VGen
33
0
0
10 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
65
59
0
09 Oct 2024
ECHOPulse: ECG controlled echocardio-grams video generation
ECHOPulse: ECG controlled echocardio-grams video generation
Yiwei Li
Sekeun Kim
Zihao Wu
Hanqi Jiang
Yi Pan
...
Sifan Song
Yucheng Shi
Tianming Liu
Quanzheng Li
Xiang Li
VGen
21
1
0
04 Oct 2024
Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data
Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data
Lukas Heine
Fabian Horst
Jana Fragemann
Gijs Luijten
M. Balzer
Jan Egger
F. Bahnsen
M. Sarfraz
Jens Kleesiek
15
0
0
25 Sep 2024
Learning to Discover Forgery Cues for Face Forgery Detection
Learning to Discover Forgery Cues for Face Forgery Detection
Jiahe Tian
Peng-Wen Chen
Cai Yu
Xiaomeng Fu
Xi Wang
Jiao Dai
Jizhong Han
CVBM
AAML
18
6
0
02 Sep 2024
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
Yuxin Zhu
Huiyu Duan
Kaiwei Zhang
Yucheng Zhu
Xilei Zhu
Long Teng
Xiongkuo Min
Guangtao Zhai
64
1
0
10 Aug 2024
Causal Understanding For Video Question Answering
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
33
0
0
23 Jul 2024
A Comprehensive Review of Few-shot Action Recognition
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
52
3
0
20 Jul 2024
Hierarchical Separable Video Transformer for Snapshot Compressive
  Imaging
Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
Ping Wang
Yulun Zhang
Lishun Wang
Xin Yuan
ViT
26
1
0
16 Jul 2024
12345
Next