Anticipative Video Transformer

3 June 2021

Papers citing "Anticipative Video Transformer"

50 / 51 papers shown

Title
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Haibo Wang Bo Feng Zhengfeng Lai Mingze Xu Shiyu Li Weifeng Ge Afshin Dehghan Meng Cao Ping-Chia Huang OffRL 49 0 0 08 May 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding Dibyadip Chatterjee Edoardo Remelli Yale Song Bugra Tekin Abhay Mittal ... Shreyas Hampali Eric Sauser Shugao Ma Angela Yao Fadime Sener VLM 40 0 0 10 Apr 2025
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos Lorenzo Mur-Labadia Josechu Guerrero Ruben Martinez-Cantin VGen 56 0 0 11 Mar 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning Baoqi Pei Y. Huang Jilan Xu Guo Chen Yuping He ... Yali Wang Weidi Xie Yu Qiao Fei Wu Limin Wang 41 0 0 02 Mar 2025
EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training Qingyao Tian Huai Liao Xinyan Huang Bingyu Yang Dongdong Lei Sebastien Ourselin Hongbin Liu Mamba 68 0 0 26 Feb 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos Luigi Seminara G. Farinella Antonino Furnari 56 7 0 10 Jan 2025
ExpertAF: Expert Actionable Feedback from Video Kumar Ashutosh Tushar Nagarajan Georgios Pavlakos Kris M. Kitani Kristen Grauman VGen 44 2 0 01 Aug 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models Himangi Mittal Nakul Agarwal Shao-Yuan Lo Kwonjoon Lee 30 14 0 30 May 2024
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind Chiara Plizzari Shubham Goel Toby Perrett Jacob Chalk Angjoo Kanazawa Dima Damen 33 10 0 07 Apr 2024
Koala: Key frame-conditioned long video-LLM Reuben Tan Ximeng Sun Ping Hu Jui-hsien Wang Hanieh Deilamsalehy Bryan A. Plummer Bryan C. Russell Kate Saenko 38 35 0 05 Apr 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World Yifei Huang Guo Chen Jilan Xu Mingfang Zhang Lijin Yang ... Hongjie Zhang Lu Dong Yali Wang Limin Wang Yu Qiao EgoV 57 36 0 24 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding Ahmad A Mahmood Ashmal Vayani Muzammal Naseer Salman Khan Fahad Shahbaz Khan LRM 49 7 0 21 Mar 2024
On the Utility of 3D Hand Poses for Action Recognition Md Salman Shamil Dibyadip Chatterjee Fadime Sener Shugao Ma Angela Yao 32 5 0 14 Mar 2024
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains Rohan Myer Krishnan Zitian Tang Zhiqiu Yu Chen Sun 51 1 0 30 Nov 2023
An Outlook into the Future of Egocentric Vision Chiara Plizzari Gabriele Goletto Antonino Furnari Siddhant Bansal Francesco Ragusa G. Farinella Dima Damen Tatiana Tommasi EgoV 32 38 0 14 Aug 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? Qi Zhao Shijie Wang Ce Zhang Changcheng Fu Minh Quan Do Nakul Agarwal Kwonjoon Lee Chen Sun LM&Ro 46 49 0 31 Jul 2023
Affordances from Human Videos as a Versatile Representation for Robotics Shikhar Bahl Russell Mendonca Lili Chen Unnat Jain Deepak Pathak 35 160 0 17 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding Honglu Zhou Roberto Martín-Martín Mubbasir Kapadia Silvio Savarese Juan Carlos Niebles 23 38 0 31 Mar 2023
HierVL: Learning Hierarchical Video-Language Embeddings Kumar Ashutosh Rohit Girdhar Lorenzo Torresani Kristen Grauman VLM AI4TS 20 51 0 05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos Kumar Ashutosh Rohit Girdhar Lorenzo Torresani Kristen Grauman 11 4 0 05 Jan 2023
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory Santhosh Kumar Ramakrishnan Ziad Al-Halah Kristen Grauman 106 39 0 02 Jan 2023
Inductive Attention for Video Action Anticipation Tsung-Ming Tai G. Fiameni Cheng-Kuang Lee Simon See O. Lanz 31 1 0 17 Dec 2022
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU Saghar Irandoust Thibaut Durand Yunduz Rakhmangulova Wenjie Zi Hossein Hajimirsadeghi ViT 33 6 0 09 Nov 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction Samrudhdhi B. Rangrej Kevin J Liang Tal Hassner James J. Clark 25 3 0 24 Oct 2022
Rethinking Learning Approaches for Long-Term Action Anticipation Megha Nawhal Akash Abdu Jyothi Greg Mori AI4TS 34 26 0 20 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions Himangi Mittal Pedro Morgado Unnat Jain Abhinav Gupta 66 22 0 27 Sep 2022
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain Francesco Ragusa Antonino Furnari G. Farinella EgoV 33 23 0 19 Sep 2022
Vision Transformers for Action Recognition: A Survey Anwaar Ulhaq Naveed Akhtar Ganna Pogrebna Ajmal Saeed Mian ViT 19 44 0 13 Sep 2022
Predicting the Next Action by Modeling the Abstract Goal Debaditya Roy Basura Fernando EgoV 16 18 0 12 Sep 2022
Expanding Language-Image Pretrained Models for General Video Recognition Bolin Ni Houwen Peng Minghao Chen Songyang Zhang Gaofeng Meng Jianlong Fu Shiming Xiang Haibin Ling VLM CLIP ViT 23 312 0 04 Aug 2022
EgoEnv: Human-centric environment representations from egocentric video Tushar Nagarajan Santhosh Kumar Ramakrishnan Ruta Desai James M. Hillis Kristen Grauman EgoV 21 19 0 22 Jul 2022
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network Nikolaos Gkalelis Dimitrios Daskalakis Vasileios Mezaris 8 10 0 20 Jul 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos Rohit Girdhar Alaaeldin El-Nouby Mannat Singh Kalyan Vasudev Alwala Armand Joulin Ishan Misra ViT 27 97 0 16 Jun 2022
GateHUB: Gated History Unit with Background Suppression for Online Action Detection Junwen Chen Gaurav Mittal Ye Yu Yu Kong Mei Chen 33 33 0 09 Jun 2022
Unified Recurrence Modeling for Video Action Anticipation Tsung-Ming Tai G. Fiameni Cheng-Kuang Lee Simon See O. Lanz 19 8 0 02 Jun 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos Shao-Wei Liu Subarna Tripathi Somdeb Majumdar Xiaolong Wang EgoV 22 93 0 04 Apr 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos Tomávs Souvcek Jean-Baptiste Alayrac Antoine Miech Ivan Laptev Josef Sivic 19 32 0 22 Mar 2022
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis Dominik Rivoir Isabel Funke Stefanie Speidel 19 15 0 15 Mar 2022
Visual Acoustic Matching Changan Chen Ruohan Gao P. Calamia Kristen Grauman 16 55 0 14 Feb 2022
Video Transformers: A Survey Javier Selva A. S. Johansen Sergio Escalera Kamal Nasrollahi T. Moeslund Albert Clapés ViT 20 103 0 16 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound Rowan Zellers Jiasen Lu Ximing Lu Youngjae Yu Yanpeng Zhao Mohammadreza Salehi Aditya Kusupati Jack Hessel Ali Farhadi Yejin Choi 26 207 0 07 Jan 2022
SWAT: Spatial Structure Within and Among Tokens Kumara Kahatapitiya Michael S. Ryoo 23 6 0 26 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,018 0 13 Oct 2021
EAN: Event Adaptive Network for Enhanced Action Recognition Yuan Tian Yichao Yan Guangtao Zhai G. Guo Zhiyong Gao 27 41 0 22 Jul 2021
VidTr: Video Transformer Without Convolutions Yanyi Zhang Xinyu Li Chunhui Liu Bing Shuai Yi Zhu Biagio Brattoli Hao Chen I. Marsic Joseph Tighe ViT 136 193 0 23 Apr 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 280 1,981 0 09 Feb 2021
Video Transformer Network Daniel Neimark Omri Bar Maya Zohar Dotan Asselmann ViT 193 421 0 01 Feb 2021
Forecasting Action through Contact Representations from First Person Video Eadom Dessalene Chinmaya Devaraj Michael Maynord Cornelia Fermuller Yiannis Aloimonos EgoV 58 60 0 01 Feb 2021
Generic Event Boundary Detection: A Benchmark for Event Segmentation Mike Zheng Shou Stan Weixian Lei Weiyao Wang Deepti Ghadiyaram Matt Feiszli VOS 85 76 0 26 Jan 2021
Learning to Anticipate Egocentric Actions by Imagination Yu Wu Linchao Zhu Xiaohan Wang Yi Yang Fei Wu EgoV 77 69 0 13 Jan 2021