Object-centric Video Representation for Long-term Action Anticipation

Object-centric Video Representation for Long-term Action Anticipation

31 October 2023

Shijie Wang

Papers citing "Object-centric Video Representation for Long-term Action Anticipation"

19 / 19 papers shown

Title
Action Anticipation from SoccerNet Football Video Broadcasts Mohamad Dalal Artur Xarles A. Cioppa Silvio Giancola Marc Van Droogenbroeck Bernard Ghanem Albert Clapés Sergio Escalera T. Moeslund AI4TS 36 0 0 16 Apr 2025
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context Manuel Benavent-Lledo David Mulero-Pérez David Ortiz-Perez José García Rodríguez Antonis Argyros 24 0 0 28 Oct 2024
Human Action Anticipation: A Survey Bolin Lai Sam Toyer Tushar Nagarajan Rohit Girdhar S. Zha James M. Rehg Kris M. Kitani Kristen Grauman Ruta Desai Miao Liu AI4TS 41 1 0 17 Oct 2024
TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction Kojiro Takeyama Yimeng Liu Misha Sra 29 1 0 05 Oct 2024
Anticipation through Head Pose Estimation: a preliminary study Federico Figari Tomenotti Nicoletta Noceti 23 0 0 10 Aug 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective Rui Qian Shuangrui Ding Dahua Lin OCL 52 1 0 09 Jul 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models Himangi Mittal Nakul Agarwal Shao-Yuan Lo Kwonjoon Lee 41 14 0 30 May 2024
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos Junyi Ma Jingyi Xu Xieyuanli Chen Hesheng Wang VGen 32 7 0 07 May 2024
Uncertainty-boosted Robust Video Activity Anticipation Zhaobo Qi Shuhui Wang Weigang Zhang Qingming Huang 36 5 0 29 Apr 2024
DITTO: Demonstration Imitation by Trajectory Transformation Nick Heppert Max Argus Tim Welschehold Thomas Brox Abhinav Valada 67 16 0 22 Mar 2024
VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation Congqi Cao Ze Sun Qinyi Lv Lingtong Min Yanning Zhang ViT 18 3 0 08 Jul 2023
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation Razvan-George Pasca Alexey Gavryushin Muhammad Hamza Yen-Ling Kuo Kaichun Mo Luc Van Gool Otmar Hilliges Xi Wang 22 14 0 22 Jan 2023
Omnivore: A Single Model for Many Visual Modalities Rohit Girdhar Mannat Singh Nikhil Ravi L. V. D. van der Maaten Armand Joulin Ishan Misra 223 225 0 20 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 232 1,024 0 13 Oct 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Xiuye Gu Nayeon Lee Weicheng Kuo Huayu Chen VLM ObjD 225 899 0 28 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Huayu Chen Boqing Gong ViT 248 577 0 22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 280 1,982 0 09 Feb 2021
Feature Pyramid Networks for Object Detection Nayeon Lee Piotr Dollár Ross B. Girshick Kaiming He Bharath Hariharan Serge J. Belongie ObjD 183 21,819 0 09 Dec 2016
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 245 31,257 0 16 Jan 2013