Timeception for Complex Action Recognition

4 December 2018

Papers citing "Timeception for Complex Action Recognition"

47 / 47 papers shown

Title
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation Congqi Cao Lanshu Hu Yating Yu Y. Zhang VLM 135 0 0 03 May 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding Shehreen Azad Vibhav Vineet Y. S. Rawat VLM 125 1 0 11 Mar 2025
Situational Scene Graph for Structured Human-centric Situation Understanding Chinthani Sugandhika Chen Li Deepu Rajan Basura Fernando 140 1 0 30 Oct 2024
Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network Hao Xing Darius Burschka 34 11 0 10 Oct 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory Dingxin Cheng Mingda Li Jingyu Liu Yongxin Guo Bin Jiang Qingbin Liu Xi Chen Bo Zhao 30 4 0 10 Sep 2024
A New Lightweight Hybrid Graph Convolutional Neural Network -- CNN Scheme for Scene Classification using Object Detection Inference Ayman Beghdadi Azeddine Beghdadi M. Ullah F. A. Cheikh Malik Mallem 24 1 0 19 Jul 2024
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition Jiaming Zhou Hanjun Li Kun-Yu Lin Junwei Liang 21 1 0 28 Nov 2023
Training a Large Video Model on a Single Machine in a Day Yue Zhao Philipp Krahenbuhl VLM 29 15 0 28 Sep 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? Qi Zhao Shijie Wang Ce Zhang Changcheng Fu Minh Quan Do Nakul Agarwal Kwonjoon Lee Chen Sun LM&Ro 46 49 0 31 Jul 2023
HierVL: Learning Hierarchical Video-Language Embeddings Kumar Ashutosh Rohit Girdhar Lorenzo Torresani Kristen Grauman VLM AI4TS 20 51 0 05 Jan 2023
Rethinking Learning Approaches for Long-Term Action Anticipation Megha Nawhal Akash Abdu Jyothi Greg Mori AI4TS 34 26 0 20 Oct 2022
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain Francesco Ragusa Antonino Furnari G. Farinella EgoV 38 23 0 19 Sep 2022
Video Question Answering with Iterative Video-Text Co-Tokenization A. Piergiovanni K. Morton Weicheng Kuo Michael S. Ryoo A. Angelova 20 17 0 01 Aug 2022
SVGraph: Learning Semantic Graphs from Instructional Videos Madeline Chantry Schiappa Y. S. Rawat 17 4 0 16 Jul 2022
Temporal Relevance Analysis for Video Action Models Quanfu Fan Donghyun Kim Chun-Fu Chen Chen Stan Sclaroff Kate Saenko Sarah Adel Bargal FAtt 22 0 0 25 Apr 2022
Long Movie Clip Classification with State-Space Video Models Md. Mohaiminul Islam Gedas Bertasius VLM 38 101 0 04 Apr 2022
Beyond Fixation: Dynamic Window Visual Transformer Pengzhen Ren Changlin Li Guangrun Wang Yun Xiao Qing Du Xiaodan Liang Qing Du Xiaodan Liang Xiaojun Chang ViT 20 32 0 24 Mar 2022
Gate-Shift-Fuse for Video Action Recognition Swathikiran Sudhakaran Sergio Escalera O. Lanz 20 22 0 16 Mar 2022
Learning To Recognize Procedural Activities with Distant Supervision Xudong Lin Fabio Petroni Gedas Bertasius Marcus Rohrbach Shih-Fu Chang Lorenzo Torresani 22 82 0 26 Jan 2022
Temporal-attentive Covariance Pooling Networks for Video Recognition Zilin Gao Qilong Wang Bingbing Zhang Q. Hu P. Li 18 24 0 27 Oct 2021
ASFormer: Transformer for Action Segmentation Fangqiu Yi Hongyu Wen Tingting Jiang ViT 71 172 0 16 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,018 0 13 Oct 2021
ActionCLIP: A New Paradigm for Video Action Recognition Mengmeng Wang Jiazheng Xing Yong Liu VLM 149 362 0 17 Sep 2021
Towards Long-Form Video Understanding Chaoxia Wu Philipp Krahenbuhl VLM ViT 36 165 0 21 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? Michael S. Ryoo A. Piergiovanni Anurag Arnab Mostafa Dehghani A. Angelova ViT 21 127 0 21 Jun 2021
Coarse to Fine Multi-Resolution Temporal Convolutional Network Dipika Singhania R. Rahaman Angela Yao AI4TS 16 55 0 23 May 2021
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living Srijan Das Rui Dai Di Yang F. Brémond ViT 36 66 0 17 May 2021
Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities S. Swetha Hilde Kuehne Y. S. Rawat M. Shah 24 16 0 30 Apr 2021
Multiscale Vision Transformers Haoqi Fan Bo Xiong K. Mangalam Yanghao Li Zhicheng Yan Jitendra Malik Christoph Feichtenhofer ViT 19 1,221 0 22 Apr 2021
No frame left behind: Full Video Action Recognition X. Liu S. Pintea F. Karimi Nejadasl O. Booij J. C. V. Gemert 19 40 0 29 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 16 33 0 18 Mar 2021
Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition Zachary Wharton Ardhendu Behera Yonghuai Liu Nikolaos Bessis 39 35 0 17 Jan 2021
GTA: Global Temporal Attention for Video Action Understanding Bo He Xitong Yang Zuxuan Wu Hao Chen Ser-Nam Lim Abhinav Shrivastava ViT 33 27 0 15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition Yi Zhu Xinyu Li Chunhui Liu Mohammadreza Zolfaghari Yuanjun Xiong Chongruo Wu Zhi-Li Zhang Joseph Tighe R. Manmatha Mu Li VLM AI4TS 30 184 0 11 Dec 2020
t-EVA: Time-Efficient t-SNE Video Annotation Soroosh Poorgholi O. Kayhan J. C. V. Gemert 9 5 0 26 Nov 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning Zehua Zhang David J. Crandall AI4TS SSL 23 23 0 23 Nov 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition Chun-Fu Chen Rameswar Panda K. Ramakrishnan Rogerio Feris J. M. Cohn A. Oliva Quanfu Fan 21 95 0 22 Oct 2020
Motion Prediction Using Temporal Inception Module Tim Lebailly Sena Kiciroglu Mathieu Salzmann Pascal Fua Wei Wang 38 37 0 06 Oct 2020
Project RISE: Recognizing Industrial Smoke Emissions Yen-Chia Hsu Ting-Hao 'Kenneth' Huang Ting-Yao Hu P. Dille Sean Prendi Ryan N. Hoffman Anastasia Tsuhlares Jessica Pachuta Randy Sargent I. Nourbakhsh 20 19 0 13 May 2020
X3D: Expanding Architectures for Efficient Video Recognition Christoph Feichtenhofer 66 998 0 09 Apr 2020
Actor-Transformers for Group Activity Recognition Kirill Gavrilyuk Ryan Sanford Mehrsan Javan Cees G. M. Snoek ViT 19 178 0 28 Mar 2020
PIC: Permutation Invariant Convolution for Recognizing Long-range Activities Noureldien Hussein E. Gavves A. Smeulders VLM 18 13 0 18 Mar 2020
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 194 205 0 23 Jan 2020
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs Jingwei Ji Ranjay Krishna Li Fei-Fei Juan Carlos Niebles 39 335 0 15 Dec 2019
More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation Quanfu Fan Chun-Fu Chen Hilde Kuehne Marco Pistoia David D. Cox 24 126 0 02 Dec 2019
PISEP^2: Pseudo Image Sequence Evolution based 3D Pose Prediction Xiaoli Liu Jianqin Yin Huaping Liu Yilong Yin 3DH 29 7 0 04 Sep 2019
Relational Long Short-Term Memory for Video Action Recognition Zexi Chen B. Ramachandra Tianfu Wu Ranga Raju Vatsavai 17 5 0 16 Nov 2018