ActionVLAD: Learning spatio-temporal aggregation for action classification

10 April 2017

Papers citing "ActionVLAD: Learning spatio-temporal aggregation for action classification"

50 / 93 papers shown

Title
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis Amir Hosein Fadaei M. Dehaqani 45 0 0 11 Feb 2025
Situational Scene Graph for Structured Human-centric Situation Understanding Chinthani Sugandhika Chen Li Deepu Rajan Basura Fernando 209 1 0 30 Oct 2024
Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation Swati Jindal Mohit Yadav Roberto Manduchi 37 5 0 08 Apr 2024
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition Jiaming Zhou Hanjun Li Kun-Yu Lin Junwei Liang 29 1 0 28 Nov 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? Qi Zhao Shijie Wang Ce Zhang Changcheng Fu Minh Quan Do Nakul Agarwal Kwonjoon Lee Chen Sun LM&Ro 56 49 0 31 Jul 2023
HierVL: Learning Hierarchical Video-Language Embeddings Kumar Ashutosh Rohit Girdhar Lorenzo Torresani Kristen Grauman VLM AI4TS 26 53 0 05 Jan 2023
C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action Segmentation Dipika Singhania R. Rahaman Angela Yao 16 28 0 20 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data Roei Herzig Ofir Abramovich Elad Ben-Avraham Assaf Arbelle Leonid Karlinsky Ariel Shamir Trevor Darrell Amir Globerson 41 16 0 08 Dec 2022
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition Santosh Kumar Yadav Achleshwar Luthra Esha Pahwa K. Tiwari Heena Rathore Hari Mohan Pandey Peter Corcoran 34 12 0 07 Dec 2022
PatchBlender: A Motion Prior for Video Transformers Gabriele Prato Yale Song Janarthanan Rajendran R. Devon Hjelm Neel Joshi Sarath Chandar ViT 27 0 0 11 Nov 2022
SWTF: Sparse Weighted Temporal Fusion for Drone-Based Activity Recognition Santosh Kumar Yadav Esha Pahwa Achleshwar Luthra K. Tiwari Hari Mohan Pandey Peter Corcoran 23 4 0 10 Nov 2022
Rethinking Learning Approaches for Long-Term Action Anticipation Megha Nawhal Akash Abdu Jyothi Greg Mori AI4TS 39 26 0 20 Oct 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer Muhammad Bilal Shaikh Douglas Chai S. Islam Naveed Akhtar 32 5 0 11 Sep 2022
Surgical Skill Assessment via Video Semantic Aggregation Zhenqiang Li Lin Gu Weimin Wang Ryosuke Nakamura Yoichi Sato 28 13 0 04 Aug 2022
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network Nikolaos Gkalelis Dimitrios Daskalakis Vasileios Mezaris 19 10 0 20 Jul 2022
Gate-Shift-Fuse for Video Action Recognition Swathikiran Sudhakaran Sergio Escalera Oswald Lanz 22 22 0 16 Mar 2022
VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation Jiong Wang Zhou Zhao Weike Jin Xinyu Duan Zhen Lei Baoxing Huai Yiling Wu Xiaofei He CVBM AAML VLM 24 12 0 21 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition Chao-Yuan Wu Yanghao Li K. Mangalam Haoqi Fan Bo Xiong Jitendra Malik Christoph Feichtenhofer ViT 48 198 0 20 Jan 2022
Temporal-attentive Covariance Pooling Networks for Video Recognition Zilin Gao Qilong Wang Bingbing Zhang Q. Hu P. Li 21 24 0 27 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition M. C. Leong Hui Li Tan Haosong Zhang Liyuan Li Feng Lin J. Lim 40 10 0 12 Oct 2021
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation Jay Patravali Gaurav Mittal Ye Yu Fuxin Li Mei Chen 18 19 0 30 Sep 2021
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device Ji Lin Chuang Gan Kuan-Chieh Jackson Wang Song Han 40 64 0 27 Sep 2021
Social Fabric: Tubelet Compositions for Video Relation Detection Shuo Chen Zenglin Shi Pascal Mettes Cees G. M. Snoek ViT 36 21 0 18 Aug 2021
Towards Long-Form Video Understanding Chaoxia Wu Philipp Krahenbuhl VLM ViT 49 166 0 21 Jun 2021
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning Wenhao Wu Yuxiang Zhao Yanwu Xu Xiao Tan Dongliang He ... Jinxing Ye Yingying Li Mingde Yao Zichao Dong Yifeng Shi AI4TS 30 27 0 25 May 2021
Coarse to Fine Multi-Resolution Temporal Convolutional Network Dipika Singhania R. Rahaman Angela Yao AI4TS 16 55 0 23 May 2021
VidTr: Video Transformer Without Convolutions Yanyi Zhang Xinyu Li Chunhui Liu Bing Shuai Yi Zhu Biagio Brattoli Hao Chen I. Marsic Joseph Tighe ViT 148 193 0 23 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval Xiaohan Wang Linchao Zhu Yi Yang 170 170 0 20 Apr 2021
Higher Order Recurrent Space-Time Transformer for Video Action Prediction Tsung-Ming Tai G. Fiameni Cheng-Kuang Lee Oswald Lanz 36 9 0 17 Apr 2021
No frame left behind: Full Video Action Recognition X. Liu S. Pintea F. Karimi Nejadasl Olaf Booij Jan van Gemert 21 40 0 29 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 29 33 0 18 Mar 2021
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries Swathikiran Sudhakaran Sergio Escalera Oswald Lanz EgoV 27 15 0 16 Feb 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 283 1,984 0 09 Feb 2021
Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition Zachary Wharton Ardhendu Behera Yonghuai Liu Nikolaos Bessis 39 35 0 17 Jan 2021
GTA: Global Temporal Attention for Video Action Understanding Bo He Xitong Yang Zuxuan Wu Hao Chen Ser-Nam Lim Abhinav Shrivastava ViT 33 27 0 15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition Yi Zhu Xinyu Li Chunhui Liu Mohammadreza Zolfaghari Yuanjun Xiong Chongruo Wu Zhi-Li Zhang Joseph Tighe R. Manmatha Mu Li VLM AI4TS 38 185 0 11 Dec 2020
t-EVA: Time-Efficient t-SNE Video Annotation Soroosh Poorgholi O. Kayhan Jan van Gemert 16 5 0 26 Nov 2020
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data A. Mohammed I. Farup Marius Pedersen Sule YAYILGAN YILDIRIM Ø. Hovde 34 18 0 22 Nov 2020
Video Big Data Analytics in the Cloud: A Reference Architecture, Survey, Opportunities, and Open Research Issues A. Alam I. Ullah Young-Koo Lee 42 22 0 16 Nov 2020
Improved Soccer Action Spotting using both Audio and Video Streams Bastien Vanderplaetse Stéphane Dupont 41 42 0 09 Nov 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition Chun-Fu Chen Yikang Shen K. Ramakrishnan Rogerio Feris J. M. Cohn A. Oliva Quanfu Fan 23 95 0 22 Oct 2020
Learning to Sort Image Sequences via Accumulated Temporal Differences Gagan Kanojia Shanmuganathan Raman 19 0 0 22 Oct 2020
AssembleNet++: Assembling Modality Representations via Attention Connections Michael S. Ryoo A. Piergiovanni Juhana Kangaspunta A. Angelova 15 44 0 18 Aug 2020
PAN: Towards Fast Action Recognition via Learning Persistence of Appearance Can Zhang Yuexian Zou Guang Chen Lei Gan 15 39 0 08 Aug 2020
Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition M. E. Kalfaoglu Sinan Kalkan A. Aydin Alatan 3DPC 39 140 0 03 Aug 2020
Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation Yongqin Xian Bruno Korbar Matthijs Douze Lorenzo Torresani Bernt Schiele Zeynep Akata VGen 18 18 0 09 Jul 2020
TEA: Temporal Excitation and Aggregation for Action Recognition Yan-Ran Li Bin Ji Xintian Shi Jianguo Zhang Bin Kang Limin Wang ViT 25 439 0 03 Apr 2020
Learning Interactions and Relationships between Movie Characters Anna Kukleva Makarand Tapaswi Ivan Laptev 41 51 0 29 Mar 2020
PIC: Permutation Invariant Convolution for Recognizing Long-range Activities Noureldien Hussein E. Gavves A. Smeulders VLM 26 13 0 18 Mar 2020
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition Wenhao Wu Dongliang He Xiao Tan Shifeng Chen Yi Yang Shilei Wen 24 35 0 09 Feb 2020