Video Action Transformer Network

6 December 2018

Papers citing "Video Action Transformer Network"

50 / 119 papers shown

Title
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study Tamim Ahmed Thanassis Rikakis 24 0 0 03 May 2025
Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment Keyne Oei Amr Gomaa Anna Maria Feit João Belo 26 0 0 06 Sep 2024
A Comprehensive Review of Few-shot Action Recognition Yuyang Wanyan Xiaoshan Yang Weiming Dong Changsheng Xu VLM 63 3 0 20 Jul 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD Ioanna Ntinou Enrique Sanchez Georgios Tzimiropoulos 34 0 0 11 Jun 2024
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization Chanyeon Kim Jongwoon Park Hyun-sool Bae Woo Chang Kim 42 3 0 03 Apr 2024
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition Sumin Lee Sangmin Woo Muhammad Adi Nugroho Changick Kim 25 0 0 21 Nov 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding Yue Xu Yong-Lu Li Zhemin Huang Michael Xu Liu Cewu Lu Yu-Wing Tai Chi-Keung Tang EgoV 20 9 0 05 Sep 2023
UnLoc: A Unified Framework for Video Localization Tasks Shengjia Yan Xuehan Xiong Arsha Nagrani Anurag Arnab Zhonghao Wang Weina Ge David A. Ross Cordelia Schmid 22 53 0 21 Aug 2023
A Survey on Deep Learning-based Spatio-temporal Action Detection Peng Wang Fanwei Zeng Yu Qian 26 5 0 03 Aug 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers A. Gritsenko Xuehan Xiong Josip Djolonga Mostafa Dehghani Chen Sun Mario Lucic Cordelia Schmid Anurag Arnab ViT 32 13 0 24 Apr 2023
Efficient Video Action Detection with Token Dropout and Context Refinement Lei Chen Zhan Tong Yibing Song Gangshan Wu Limin Wang 36 14 0 17 Apr 2023
ChiroDiff: Modelling chirographic data with Diffusion Models Ayan Das Yongxin Yang Timothy M. Hospedales Tao Xiang Yi-Zhe Song DiffM 24 10 0 07 Apr 2023
DOAD: Decoupled One Stage Action Detection Network Shuning Chang Pichao Wang Fan Wang Jiashi Feng Mike Zheng Show 13 4 0 01 Apr 2023
YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection Jianhua Yang Kun Dai ObjD 16 17 0 14 Feb 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention Shashanka Venkataramanan Amir Ghodrati Yuki M. Asano Fatih Porikli A. Habibian ViT 15 25 0 05 Jan 2023
Inductive Attention for Video Action Anticipation Tsung-Ming Tai G. Fiameni Cheng-Kuang Lee Simon See O. Lanz 31 1 0 17 Dec 2022
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction Shuliang Ning Mengcheng Lan Yanran Li Chaofeng Chen Qian Chen Xunlai Chen Xiaoguang Han Shuguang Cui 28 20 0 09 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data Roei Herzig Ofir Abramovich Elad Ben-Avraham Assaf Arbelle Leonid Karlinsky Ariel Shamir Trevor Darrell Amir Globerson 32 16 0 08 Dec 2022
MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection Y. Chen Zhengzhe Liu Baoheng Zhang W. Fok Xiaojuan Qi Yik-Chung Wu 10 109 0 28 Nov 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization Chen Zhao Shuming Liu K. Mangalam Bernard Ghanem 19 17 0 25 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions Yong-Lu Li Hongwei Fan Zuoyu Qiu Yiming Dou Liang Xu ... Peiyang Guo Haisheng Su Dongliang Wang Wei Yu Wu Cewu Lu 22 7 0 14 Nov 2022
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data Huy Hoang Nguyen Matthew B. Blaschko S. Saarakkala A. Tiulpin MedIm AI4CE 48 15 0 25 Oct 2022
Exploring Self-Attention for Crop-type Classification Explainability Ivica Obadic R. Roscher Dario Augusto Borges Oliveira Xiao Xiang Zhu 22 7 0 24 Oct 2022
Holistic Interaction Transformer Network for Action Detection Gueter Josmy Faure Min-Hung Chen S. Lai 33 37 0 23 Oct 2022
Rethinking Learning Approaches for Long-Term Action Anticipation Megha Nawhal Akash Abdu Jyothi Greg Mori AI4TS 34 26 0 20 Oct 2022
Grounded Video Situation Recognition Zeeshan Khan C. V. Jawahar Makarand Tapaswi 22 13 0 19 Oct 2022
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition Dasom Ahn Sangwon Kim H. Hong ByoungChul Ko ViT 26 96 0 14 Oct 2022
On the Learning Mechanisms in Physical Reasoning Shiqian Li Ke Wu Chi Zhang Yixin Zhu AI4CE 44 13 0 05 Oct 2022
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding Erica K. Shimomoto Edison Marrese-Taylor Hiroya Takamura Ichiro Kobayashi Hideki Nakayama Yusuke Miyao 21 7 0 26 Sep 2022
Vision Transformers for Action Recognition: A Survey Anwaar Ulhaq Naveed Akhtar Ganna Pogrebna Ajmal Saeed Mian ViT 19 44 0 13 Sep 2022
Is an Object-Centric Video Representation Beneficial for Transfer? Chuhan Zhang Ankush Gupta Andrew Zisserman ViT 31 26 0 20 Jul 2022
Learning Parallax Transformer Network for Stereo Image JPEG Artifacts Removal Xuhao Jiang Weimin Tan Ri Cheng Shili Zhou Bo Yan ViT 11 6 0 15 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation Anurag Arnab Xuehan Xiong A. Gritsenko Rob Romijnders Josip Djolonga Mostafa Dehghani Chen Sun Mario Lucic Cordelia Schmid 25 8 0 08 Jul 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos Rohit Girdhar Alaaeldin El-Nouby Mannat Singh Kalyan Vasudev Alwala Armand Joulin Ishan Misra ViT 27 97 0 16 Jun 2022
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers Ran Liu Mehdi Azabou M. Dabagia Jingyun Xiao Eva L. Dyer AI4CE 27 19 0 10 Jun 2022
Do we really need temporal convolutions in action segmentation? Dazhao Du Bing-Huang Su Yu Li Zhongang Qi Lingyu Si Ying Shan ViT 21 16 0 26 May 2022
VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation Yuxing Chen Renshu Gu Ouhan Huang Gangyong Jia 3DH 33 11 0 25 May 2022
Model-agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition Kazuki Omi Jun Kimata Toru Tamaki 21 7 0 15 Apr 2022
A Transformer-Based Contrastive Learning Approach for Few-Shot Sign Language Recognition Silvan Ferreira Esdras Costa M. Dahia J. Rocha SLR 9 1 0 05 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos Shao-Wei Liu Subarna Tripathi Somdeb Majumdar Xiaolong Wang EgoV 22 93 0 04 Apr 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness Giulio Lovisotto Nicole Finnie Mauricio Muñoz Chaithanya Kumar Mummadi J. H. Metzen AAML ViT 17 32 0 25 Mar 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review Yuting Yang Licheng Jiao Xuantong Liu F. Liu Shuyuan Yang Zhixi Feng Xu Tang ViT MedIm 22 28 0 24 Mar 2022
Point3D: tracking actions as moving points with 3D CNNs Shentong Mo Jingfei Xia Xiaoqing Ellen Tan Bhiksha Raj 3DPC 18 5 0 20 Mar 2022
CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction Zhuoran Song Yihong Xu Zhezhi He Li Jiang Naifeng Jing Xiaoyao Liang ViT 18 39 0 09 Mar 2022
Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences G. Moon E. Cyr 20 5 0 07 Mar 2022
Motion-driven Visual Tempo Learning for Video-based Action Recognition Yuanzhong Liu Junsong Yuan Zhigang Tu 19 58 0 24 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs Oana Ignat Santiago Castro Yuhang Zhou Jiajun Bao Dandan Shan Rada Mihalcea 18 3 0 16 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers Chen-Da Liu-Zhang Jianxin Wu Yin Li ViT 23 328 0 16 Feb 2022
Video Transformers: A Survey Javier Selva A. S. Johansen Sergio Escalera Kamal Nasrollahi T. Moeslund Albert Clapés ViT 20 103 0 16 Jan 2022
Video Joint Modelling Based on Hierarchical Transformer for Co-summarization Haopeng Li Qiuhong Ke Mingming Gong Zhang Rui ViT 26 22 0 27 Dec 2021