Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1611.02155
Cited By
Spatiotemporal Residual Networks for Video Action Recognition
7 November 2016
Christoph Feichtenhofer
A. Pinz
Richard P. Wildes
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Spatiotemporal Residual Networks for Video Action Recognition"
50 / 273 papers shown
Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis
Sai Varun Kodathala
Rakesh Vunnam
129
0
0
25 Sep 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGen
VGen
568
3
0
01 Apr 2025
Exploring Simple Siamese Network for High-Resolution Video Quality Assessment
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Guotao Shen
Ziheng Yan
Xin Jin
Longhai Wu
Jie Chen
Ilhyun Cho
Cheul-hee Hahm
185
0
0
04 Mar 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Computer Vision and Pattern Recognition (CVPR), 2023
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Celine Lee
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
279
39
0
31 Dec 2024
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
440
19
0
19 Dec 2024
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
181
4
0
15 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
Michael R. Lyu
Liwei Wang
VLM
192
7
0
08 Oct 2024
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition
Neuromorphic Computing and Engineering (NCE), 2024
Shiting Xiao
Yuhang Li
Youngeun Kim
Donghyun Lee
Priyadarshini Panda
237
6
0
03 Sep 2024
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
European Conference on Artificial Intelligence (ECAI), 2024
Mushui Liu
Bozheng Li
Yunlong Yu
VLM
241
14
0
12 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Conference on Multimedia Information Processing and Retrieval (MIPR), 2024
Rex Liu
Xin Liu
267
1
0
08 Aug 2024
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
195
1
0
18 Jul 2024
Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion
Quanmin Liang
Zhilin Huang
Xiawu Zheng
Feidiao Yang
Jun Peng
Kai Huang
Yonghong Tian
221
5
0
28 Jun 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
238
8
0
21 Jun 2024
A spatiotemporal style transfer algorithm for dynamic visual stimulus generation
Nature Computational Science (Nat. Comput. Sci.), 2024
Antonino Greco
Markus Siegel
225
7
0
07 Mar 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
492
101
0
08 Feb 2024
Taylor Videos for Action Recognition
International Conference on Machine Learning (ICML), 2024
Lei Wang
Xiuyuan Yuan
Tom Gedeon
Liang Zheng
555
13
0
05 Feb 2024
Classification of Tennis Actions Using Deep Learning
Emil Hovad
Therese Hougaard-Jensen
L. H. Clemmensen
70
6
0
04 Feb 2024
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Shahzad Ahmad
S. Chanda
Yogesh S Rawat
VLM
278
11
0
13 Dec 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
IEEE International Conference on Computer Vision (ICCV), 2023
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
175
12
0
05 Sep 2023
Improving Video Violence Recognition with Human Interaction Learning on 3D Skeleton Point Clouds
Qingxin Xiao
Guosheng Lin
Qingyao Wu
3DH
3DPC
197
5
0
26 Aug 2023
Spatial-Temporal Alignment Network for Action Recognition
Jinhui Ye
Junwei Liang
3DPC
166
2
0
19 Aug 2023
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition
Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP), 2023
S. Chaudhuri
Saumik Bhattacharya
177
6
0
07 Aug 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
IEEE International Conference on Computer Vision (ICCV), 2023
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
212
17
0
18 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
IEEE International Conference on Computer Vision (ICCV), 2023
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
257
27
0
13 Jul 2023
Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
AnLan Sun
Zhao Zhang
Meng Lei
Yuting Dai
Dong Wang
Liwei Wang
153
12
0
12 Jun 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Neurocomputing (Neurocomputing), 2023
Thanh-Dat Truong
Khoa Luu
EgoV
389
15
0
25 May 2023
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
Computer Vision and Pattern Recognition (CVPR), 2023
Ryo Hachiuma
Fumiaki Sato
Taiki Sekii
3DPC
217
46
0
27 Mar 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2022
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
395
80
0
31 Dec 2022
Deep set conditioned latent representations for action recognition
VISIGRAPP (VISIGRAPP), 2022
Akash Singh
Tom De Schepper
Kevin Mets
P. Hellinckx
José Oramas
Steven Latré
BDL
168
2
0
21 Dec 2022
MAViL: Masked Audio-Video Learners
Neural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
322
73
0
15 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
Computer Vision and Pattern Recognition (CVPR), 2022
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
CLIP
VLM
404
225
0
06 Dec 2022
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
176
1
0
23 Nov 2022
Deep Unsupervised Key Frame Extraction for Efficient Video Classification
Hao Tang
L. Ding
Songsong Wu
Bin Ren
Andrii Zadaianchuk
Paolo Rota
103
45
0
12 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
185
0
0
11 Nov 2022
Two-Stream Network for Sign Language Recognition and Translation
Neural Information Processing Systems (NeurIPS), 2022
Yutong Chen
Ronglai Zuo
Fangyun Wei
Yu-Huan Wu
Shujie Liu
Brian Mak
SLR
241
196
0
02 Nov 2022
Multimodal Neural Network For Demand Forecasting
International Conference on Neural Information Processing (ICONIP), 2022
Nitesh Kumar
K. Dheenadayalan
Suprabath Reddy
Sumant Kulkarni
AI4TS
114
8
0
20 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
British Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
312
2
0
08 Oct 2022
Multi-dataset Training of Transformers for Robust Action Recognition
Neural Information Processing Systems (NeurIPS), 2022
Junwei Liang
Enwei Zhang
Jun Zhang
Chunhua Shen
ViT
251
14
0
26 Sep 2022
FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification
IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
P. Jin
Lichao Mou
Yuansheng Hua
Gui-Song Xia
Xiao Xiang Zhu
AI4TS
254
15
0
22 Sep 2022
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain
Computer Vision and Image Understanding (CVIU), 2022
Francesco Ragusa
Antonino Furnari
G. Farinella
EgoV
225
40
0
19 Sep 2022
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Farrukh Rahman
Ömer Mubarek
Z. Kira
ViT
258
3
0
15 Sep 2022
Adaptive Perception Transformer for Temporal Action Localization
Yizheng Ouyang
Tianjin Zhang
Weibo Gu
Hongfa Wang
226
3
0
25 Aug 2022
Self-Contained Entity Discovery from Captioned Videos
M. Ayoughi
P. Mettes
Paul T. Groth
152
3
0
13 Aug 2022
Video-based Human Action Recognition using Deep Learning: A Review
Hieu H. Pham
L. Khoudour
Alain Crouzil
Pablo Zegers
S. Velastín
173
43
0
07 Aug 2022
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
European Conference on Computer Vision (ECCV), 2022
Wangmeng Xiang
Chong Li
Biao Wang
Xihan Wei
Xiangpei Hua
Lei Zhang
ViT
152
43
0
27 Jul 2022
Masked Autoencoders that Listen
Neural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
536
388
0
13 Jul 2022
Analysis and Extensions of Adversarial Training for Video Classification
K. A. Kinfu
René Vidal
AAML
223
14
0
16 Jun 2022
PrivHAR: Recognizing Human Actions From Privacy-preserving Lens
European Conference on Computer Vision (ECCV), 2022
Carlos Hinojosa
M. Márquez
Henry Arguello
Ehsan Adeli
L. Fei-Fei
Juan Carlos Niebles
PICV
250
26
0
08 Jun 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
European Conference on Computer Vision (ECCV), 2022
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
162
1
0
03 May 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
International Conference on Information Photonics (ICIP), 2022
Yang Liu
Y. Tan
Haoyu Lan
SSL
213
9
0
28 Apr 2022
1
2
3
4
5
6
Next