Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.02707
Cited By
Video Action Transformer Network
6 December 2018
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video Action Transformer Network"
50 / 119 papers shown
Title
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Tamim Ahmed
Thanassis Rikakis
24
0
0
03 May 2025
Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment
Keyne Oei
Amr Gomaa
Anna Maria Feit
João Belo
26
0
0
06 Sep 2024
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
63
3
0
20 Jul 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
34
0
0
11 Jun 2024
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Chanyeon Kim
Jongwoon Park
Hyun-sool Bae
Woo Chang Kim
42
3
0
03 Apr 2024
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
25
0
0
21 Nov 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
20
9
0
05 Sep 2023
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
22
53
0
21 Aug 2023
A Survey on Deep Learning-based Spatio-temporal Action Detection
Peng Wang
Fanwei Zeng
Yu Qian
26
5
0
03 Aug 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
32
13
0
24 Apr 2023
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
36
14
0
17 Apr 2023
ChiroDiff: Modelling chirographic data with Diffusion Models
Ayan Das
Yongxin Yang
Timothy M. Hospedales
Tao Xiang
Yi-Zhe Song
DiffM
24
10
0
07 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
13
4
0
01 Apr 2023
YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection
Jianhua Yang
Kun Dai
ObjD
16
17
0
14 Feb 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
15
25
0
05 Jan 2023
Inductive Attention for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
O. Lanz
31
1
0
17 Dec 2022
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction
Shuliang Ning
Mengcheng Lan
Yanran Li
Chaofeng Chen
Qian Chen
Xunlai Chen
Xiaoguang Han
Shuguang Cui
28
20
0
09 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
32
16
0
08 Dec 2022
MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection
Y. Chen
Zhengzhe Liu
Baoheng Zhang
W. Fok
Xiaojuan Qi
Yik-Chung Wu
10
109
0
28 Nov 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Bernard Ghanem
19
17
0
25 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions
Yong-Lu Li
Hongwei Fan
Zuoyu Qiu
Yiming Dou
Liang Xu
...
Peiyang Guo
Haisheng Su
Dongliang Wang
Wei Yu Wu
Cewu Lu
22
7
0
14 Nov 2022
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data
Huy Hoang Nguyen
Matthew B. Blaschko
S. Saarakkala
A. Tiulpin
MedIm
AI4CE
48
15
0
25 Oct 2022
Exploring Self-Attention for Crop-type Classification Explainability
Ivica Obadic
R. Roscher
Dario Augusto Borges Oliveira
Xiao Xiang Zhu
22
7
0
24 Oct 2022
Holistic Interaction Transformer Network for Action Detection
Gueter Josmy Faure
Min-Hung Chen
S. Lai
33
37
0
23 Oct 2022
Rethinking Learning Approaches for Long-Term Action Anticipation
Megha Nawhal
Akash Abdu Jyothi
Greg Mori
AI4TS
34
26
0
20 Oct 2022
Grounded Video Situation Recognition
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
22
13
0
19 Oct 2022
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
Dasom Ahn
Sangwon Kim
H. Hong
ByoungChul Ko
ViT
26
96
0
14 Oct 2022
On the Learning Mechanisms in Physical Reasoning
Shiqian Li
Ke Wu
Chi Zhang
Yixin Zhu
AI4CE
44
13
0
05 Oct 2022
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding
Erica K. Shimomoto
Edison Marrese-Taylor
Hiroya Takamura
Ichiro Kobayashi
Hideki Nakayama
Yusuke Miyao
21
7
0
26 Sep 2022
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Saeed Mian
ViT
19
44
0
13 Sep 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
31
26
0
20 Jul 2022
Learning Parallax Transformer Network for Stereo Image JPEG Artifacts Removal
Xuhao Jiang
Weimin Tan
Ri Cheng
Shili Zhou
Bo Yan
ViT
11
6
0
15 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation
Anurag Arnab
Xuehan Xiong
A. Gritsenko
Rob Romijnders
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
25
8
0
08 Jul 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
27
97
0
16 Jun 2022
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
Ran Liu
Mehdi Azabou
M. Dabagia
Jingyun Xiao
Eva L. Dyer
AI4CE
27
19
0
10 Jun 2022
Do we really need temporal convolutions in action segmentation?
Dazhao Du
Bing-Huang Su
Yu Li
Zhongang Qi
Lingyu Si
Ying Shan
ViT
21
16
0
26 May 2022
VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation
Yuxing Chen
Renshu Gu
Ouhan Huang
Gangyong Jia
3DH
33
11
0
25 May 2022
Model-agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition
Kazuki Omi
Jun Kimata
Toru Tamaki
21
7
0
15 Apr 2022
A Transformer-Based Contrastive Learning Approach for Few-Shot Sign Language Recognition
Silvan Ferreira
Esdras Costa
M. Dahia
J. Rocha
SLR
9
1
0
05 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
Shao-Wei Liu
Subarna Tripathi
Somdeb Majumdar
Xiaolong Wang
EgoV
22
93
0
04 Apr 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
17
32
0
25 Mar 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
F. Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViT
MedIm
22
28
0
24 Mar 2022
Point3D: tracking actions as moving points with 3D CNNs
Shentong Mo
Jingfei Xia
Xiaoqing Ellen Tan
Bhiksha Raj
3DPC
18
5
0
20 Mar 2022
CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction
Zhuoran Song
Yihong Xu
Zhezhi He
Li Jiang
Naifeng Jing
Xiaoyao Liang
ViT
18
39
0
09 Mar 2022
Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences
G. Moon
E. Cyr
20
5
0
07 Mar 2022
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Yuanzhong Liu
Junsong Yuan
Zhigang Tu
19
58
0
24 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
23
328
0
16 Feb 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
103
0
16 Jan 2022
Video Joint Modelling Based on Hierarchical Transformer for Co-summarization
Haopeng Li
Qiuhong Ke
Mingming Gong
Zhang Rui
ViT
26
22
0
27 Dec 2021
1
2
3
Next