ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.02895
  4. Cited By
ActionVLAD: Learning spatio-temporal aggregation for action
  classification

ActionVLAD: Learning spatio-temporal aggregation for action classification

10 April 2017
Rohit Girdhar
Deva Ramanan
Abhinav Gupta
Josef Sivic
Bryan C. Russell
    AI4TS
ArXivPDFHTML

Papers citing "ActionVLAD: Learning spatio-temporal aggregation for action classification"

50 / 93 papers shown
Title
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
Situational Scene Graph for Structured Human-centric Situation Understanding
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
209
1
0
30 Oct 2024
Spatio-Temporal Attention and Gaussian Processes for Personalized Video
  Gaze Estimation
Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation
Swati Jindal
Mohit Yadav
Roberto Manduchi
37
5
0
08 Apr 2024
Towards Weakly Supervised End-to-end Learning for Long-video Action
  Recognition
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
29
1
0
28 Nov 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation
  from Videos?
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
56
49
0
31 Jul 2023
HierVL: Learning Hierarchical Video-Language Embeddings
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
26
53
0
05 Jan 2023
C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action
  Segmentation
C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action Segmentation
Dipika Singhania
R. Rahaman
Angela Yao
16
28
0
20 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
  using Synthetic Scene Data
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
41
16
0
08 Dec 2022
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera
  Based Activity Recognition
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition
Santosh Kumar Yadav
Achleshwar Luthra
Esha Pahwa
K. Tiwari
Heena Rathore
Hari Mohan Pandey
Peter Corcoran
34
12
0
07 Dec 2022
PatchBlender: A Motion Prior for Video Transformers
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
27
0
0
11 Nov 2022
SWTF: Sparse Weighted Temporal Fusion for Drone-Based Activity
  Recognition
SWTF: Sparse Weighted Temporal Fusion for Drone-Based Activity Recognition
Santosh Kumar Yadav
Esha Pahwa
Achleshwar Luthra
K. Tiwari
Hari Mohan Pandey
Peter Corcoran
23
4
0
10 Nov 2022
Rethinking Learning Approaches for Long-Term Action Anticipation
Rethinking Learning Approaches for Long-Term Action Anticipation
Megha Nawhal
Akash Abdu Jyothi
Greg Mori
AI4TS
39
26
0
20 Oct 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
Muhammad Bilal Shaikh
Douglas Chai
S. Islam
Naveed Akhtar
32
5
0
11 Sep 2022
Surgical Skill Assessment via Video Semantic Aggregation
Surgical Skill Assessment via Video Semantic Aggregation
Zhenqiang Li
Lin Gu
Weimin Wang
Ryosuke Nakamura
Yoichi Sato
28
13
0
04 Aug 2022
ViGAT: Bottom-up event recognition and explanation in video using
  factorized graph attention network
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
19
10
0
20 Jul 2022
Gate-Shift-Fuse for Video Action Recognition
Gate-Shift-Fuse for Video Action Recognition
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
22
22
0
16 Mar 2022
VLAD-VSA: Cross-Domain Face Presentation Attack Detection with
  Vocabulary Separation and Adaptation
VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation
Jiong Wang
Zhou Zhao
Weike Jin
Xinyu Duan
Zhen Lei
Baoxing Huai
Yiling Wu
Xiaofei He
CVBM
AAML
VLM
24
12
0
21 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
  Long-Term Video Recognition
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
48
198
0
20 Jan 2022
Temporal-attentive Covariance Pooling Networks for Video Recognition
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
21
24
0
27 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human
  Action Recognition
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
M. C. Leong
Hui Li Tan
Haosong Zhang
Liyuan Li
Feng Lin
J. Lim
40
10
0
12 Oct 2021
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned
  Meta-Adaptation
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
Jay Patravali
Gaurav Mittal
Ye Yu
Fuxin Li
Mei Chen
18
19
0
30 Sep 2021
TSM: Temporal Shift Module for Efficient and Scalable Video
  Understanding on Edge Device
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
Ji Lin
Chuang Gan
Kuan-Chieh Jackson Wang
Song Han
40
64
0
27 Sep 2021
Social Fabric: Tubelet Compositions for Video Relation Detection
Social Fabric: Tubelet Compositions for Video Relation Detection
Shuo Chen
Zenglin Shi
Pascal Mettes
Cees G. M. Snoek
ViT
36
21
0
18 Aug 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
49
166
0
21 Jun 2021
DSANet: Dynamic Segment Aggregation Network for Video-Level
  Representation Learning
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu
Yuxiang Zhao
Yanwu Xu
Xiao Tan
Dongliang He
...
Jinxing Ye
Yingying Li
Mingde Yao
Zichao Dong
Yifeng Shi
AI4TS
30
27
0
25 May 2021
Coarse to Fine Multi-Resolution Temporal Convolutional Network
Coarse to Fine Multi-Resolution Temporal Convolutional Network
Dipika Singhania
R. Rahaman
Angela Yao
AI4TS
16
55
0
23 May 2021
VidTr: Video Transformer Without Convolutions
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
148
193
0
23 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
170
170
0
20 Apr 2021
Higher Order Recurrent Space-Time Transformer for Video Action
  Prediction
Higher Order Recurrent Space-Time Transformer for Video Action Prediction
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Oswald Lanz
36
9
0
17 Apr 2021
No frame left behind: Full Video Action Recognition
No frame left behind: Full Video Action Recognition
X. Liu
S. Pintea
F. Karimi Nejadasl
Olaf Booij
Jan van Gemert
21
40
0
29 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
29
33
0
18 Mar 2021
Learning to Recognize Actions on Objects in Egocentric Video with
  Attention Dictionaries
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
27
15
0
16 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,984
0
09 Feb 2021
Coarse Temporal Attention Network (CTA-Net) for Driver's Activity
  Recognition
Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition
Zachary Wharton
Ardhendu Behera
Yonghuai Liu
Nikolaos Bessis
39
35
0
17 Jan 2021
GTA: Global Temporal Attention for Video Action Understanding
GTA: Global Temporal Attention for Video Action Understanding
Bo He
Xitong Yang
Zuxuan Wu
Hao Chen
Ser-Nam Lim
Abhinav Shrivastava
ViT
33
27
0
15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
185
0
11 Dec 2020
t-EVA: Time-Efficient t-SNE Video Annotation
t-EVA: Time-Efficient t-SNE Video Annotation
Soroosh Poorgholi
O. Kayhan
Jan van Gemert
16
5
0
26 Nov 2020
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule
  endoscopy based on weakly labeled data
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data
A. Mohammed
I. Farup
Marius Pedersen
Sule YAYILGAN YILDIRIM
Ø. Hovde
34
18
0
22 Nov 2020
Video Big Data Analytics in the Cloud: A Reference Architecture, Survey,
  Opportunities, and Open Research Issues
Video Big Data Analytics in the Cloud: A Reference Architecture, Survey, Opportunities, and Open Research Issues
A. Alam
I. Ullah
Young-Koo Lee
42
22
0
16 Nov 2020
Improved Soccer Action Spotting using both Audio and Video Streams
Improved Soccer Action Spotting using both Audio and Video Streams
Bastien Vanderplaetse
Stéphane Dupont
41
42
0
09 Nov 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action
  Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Yikang Shen
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
23
95
0
22 Oct 2020
Learning to Sort Image Sequences via Accumulated Temporal Differences
Learning to Sort Image Sequences via Accumulated Temporal Differences
Gagan Kanojia
Shanmuganathan Raman
19
0
0
22 Oct 2020
AssembleNet++: Assembling Modality Representations via Attention
  Connections
AssembleNet++: Assembling Modality Representations via Attention Connections
Michael S. Ryoo
A. Piergiovanni
Juhana Kangaspunta
A. Angelova
15
44
0
18 Aug 2020
PAN: Towards Fast Action Recognition via Learning Persistence of
  Appearance
PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
Can Zhang
Yuexian Zou
Guang Chen
Lei Gan
15
39
0
08 Aug 2020
Late Temporal Modeling in 3D CNN Architectures with BERT for Action
  Recognition
Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition
M. E. Kalfaoglu
Sinan Kalkan
A. Aydin Alatan
3DPC
39
140
0
03 Aug 2020
Generalized Few-Shot Video Classification with Video Retrieval and
  Feature Generation
Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation
Yongqin Xian
Bruno Korbar
Matthijs Douze
Lorenzo Torresani
Bernt Schiele
Zeynep Akata
VGen
18
18
0
09 Jul 2020
TEA: Temporal Excitation and Aggregation for Action Recognition
TEA: Temporal Excitation and Aggregation for Action Recognition
Yan-Ran Li
Bin Ji
Xintian Shi
Jianguo Zhang
Bin Kang
Limin Wang
ViT
25
439
0
03 Apr 2020
Learning Interactions and Relationships between Movie Characters
Learning Interactions and Relationships between Movie Characters
Anna Kukleva
Makarand Tapaswi
Ivan Laptev
41
51
0
29 Mar 2020
PIC: Permutation Invariant Convolution for Recognizing Long-range
  Activities
PIC: Permutation Invariant Convolution for Recognizing Long-range Activities
Noureldien Hussein
E. Gavves
A. Smeulders
VLM
26
13
0
18 Mar 2020
Dynamic Inference: A New Approach Toward Efficient Video Action
  Recognition
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
Wenhao Wu
Dongliang He
Xiao Tan
Shifeng Chen
Yi Yang
Shilei Wen
24
35
0
09 Feb 2020
12
Next