ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.02155
  4. Cited By
Spatiotemporal Residual Networks for Video Action Recognition

Spatiotemporal Residual Networks for Video Action Recognition

7 November 2016
Christoph Feichtenhofer
A. Pinz
Richard P. Wildes
ArXivPDFHTML

Papers citing "Spatiotemporal Residual Networks for Video Action Recognition"

50 / 273 papers shown
Title
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
73
1
0
01 Apr 2025
Exploring Simple Siamese Network for High-Resolution Video Quality Assessment
Guotao Shen
Ziheng Yan
Xin Jin
Longhai Wu
Jie Chen
Ilhyun Cho
Cheul-hee Hahm
40
0
0
04 Mar 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
69
24
0
31 Dec 2024
Scaling 4D Representations
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
85
3
0
19 Dec 2024
VidCompress: Memory-Enhanced Temporal Compression for Video
  Understanding in Large Language Models
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
46
2
0
15 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
M. Lyu
Liwei Wang
VLM
33
0
0
08 Oct 2024
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for
  Efficient Action Recognition
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition
Shiting Xiao
Yuhang Li
Youngeun Kim
Donghyun Lee
Priyadarshini Panda
44
1
0
03 Sep 2024
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal
  Omni-Scale Feature Learning
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
Mushui Liu
Bozheng Li
Yunlong Yu
VLM
28
10
0
12 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Rex Liu
Xin Liu
40
1
0
08 Aug 2024
Pose-guided multi-task video transformer for driver action recognition
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
42
0
0
18 Jul 2024
Efficient Event Stream Super-Resolution with Recursive Multi-Branch
  Fusion
Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion
Quanmin Liang
Zhilin Huang
Xiawu Zheng
Feidiao Yang
Jun Peng
Kai Huang
Yonghong Tian
46
1
0
28 Jun 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video
  Action Recognition
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
57
4
0
21 Jun 2024
A spatiotemporal style transfer algorithm for dynamic visual stimulus
  generation
A spatiotemporal style transfer algorithm for dynamic visual stimulus generation
Antonino Greco
Markus Siegel
25
2
0
07 Mar 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
95
57
0
08 Feb 2024
Taylor Videos for Action Recognition
Taylor Videos for Action Recognition
Lei Wang
Xiuyuan Yuan
Tom Gedeon
Liang Zheng
26
6
0
05 Feb 2024
Classification of Tennis Actions Using Deep Learning
Classification of Tennis Actions Using Deep Learning
Emil Hovad
Therese Hougaard-Jensen
L. H. Clemmensen
14
5
0
04 Feb 2024
EZ-CLIP: Efficient Zeroshot Video Action Recognition
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Shahzad Ahmad
S. Chanda
Yogesh S Rawat
VLM
36
7
0
13 Dec 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
25
9
0
05 Sep 2023
Improving Video Violence Recognition with Human Interaction Learning on
  3D Skeleton Point Clouds
Improving Video Violence Recognition with Human Interaction Learning on 3D Skeleton Point Clouds
Yukun Su
Guosheng Lin
Qingyao Wu
3DH
3DPC
29
3
0
26 Aug 2023
Spatial-Temporal Alignment Network for Action Recognition
Spatial-Temporal Alignment Network for Action Recognition
Jinhui Ye
Junwei Liang
3DPC
29
1
0
19 Aug 2023
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings
  for Video Action Recognition
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition
S. Chaudhuri
Saumik Bhattacharya
27
3
0
07 Aug 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
54
19
0
13 Jul 2023
Boosting Breast Ultrasound Video Classification by the Guidance of
  Keyframe Feature Centers
Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers
AnLan Sun
Zhao Zhang
Meng Lei
Yuting Dai
Dong Wang
Liwei Wang
34
5
0
12 Jun 2023
Cross-view Action Recognition Understanding From Exocentric to
  Egocentric Perspective
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
41
10
0
25 May 2023
Unified Keypoint-based Action Recognition Framework via Structured
  Keypoint Pooling
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
Ryo Hachiuma
Fumiaki Sato
Taiki Sekii
3DPC
29
37
0
27 Mar 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
  with Pre-trained Vision-Language Models
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
100
48
0
31 Dec 2022
Deep set conditioned latent representations for action recognition
Deep set conditioned latent representations for action recognition
Akash Singh
Tom De Schepper
Kevin Mets
P. Hellinckx
José Oramas
Steven Latré
BDL
19
2
0
21 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
26
51
0
15 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
CLIP
VLM
34
150
0
06 Dec 2022
Dynamic Appearance: A Video Representation for Action Recognition with
  Joint Training
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
27
1
0
23 Nov 2022
Deep Unsupervised Key Frame Extraction for Efficient Video
  Classification
Deep Unsupervised Key Frame Extraction for Efficient Video Classification
Hao Tang
L. Ding
Songsong Wu
Bin Ren
N. Sebe
Paolo Rota
22
27
0
12 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
27
0
0
11 Nov 2022
Two-Stream Network for Sign Language Recognition and Translation
Two-Stream Network for Sign Language Recognition and Translation
Yutong Chen
Ronglai Zuo
Fangyun Wei
Yu-Huan Wu
Shujie Liu
Brian Mak
SLR
42
118
0
02 Nov 2022
Multimodal Neural Network For Demand Forecasting
Multimodal Neural Network For Demand Forecasting
Nitesh Kumar
K. Dheenadayalan
Suprabath Reddy
Sumant Kulkarni
AI4TS
19
4
0
20 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
27
2
0
08 Oct 2022
Multi-dataset Training of Transformers for Robust Action Recognition
Multi-dataset Training of Transformers for Robust Action Recognition
Junwei Liang
Enwei Zhang
Jun Zhang
Chunhua Shen
ViT
45
11
0
26 Sep 2022
FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial
  Video Classification
FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification
P. Jin
Lichao Mou
Yuansheng Hua
Gui-Song Xia
Xiao Xiang Zhu
AI4TS
24
8
0
22 Sep 2022
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior
  Understanding in the Industrial-like Domain
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain
Francesco Ragusa
Antonino Furnari
G. Farinella
EgoV
43
24
0
19 Sep 2022
On the Surprising Effectiveness of Transformers in Low-Labeled Video
  Recognition
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Farrukh Rahman
Ömer Mubarek
Z. Kira
ViT
18
2
0
15 Sep 2022
Adaptive Perception Transformer for Temporal Action Localization
Adaptive Perception Transformer for Temporal Action Localization
Yizheng Ouyang
Tianjin Zhang
Weibo Gu
Hongfa Wang
21
3
0
25 Aug 2022
Self-Contained Entity Discovery from Captioned Videos
Self-Contained Entity Discovery from Captioned Videos
M. Ayoughi
P. Mettes
Paul T. Groth
28
2
0
13 Aug 2022
Video-based Human Action Recognition using Deep Learning: A Review
Video-based Human Action Recognition using Deep Learning: A Review
Hieu H. Pham
L. Khoudour
Alain Crouzil
Pablo Zegers
S. Velastín
35
34
0
07 Aug 2022
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for
  Action Recognition
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Wangmeng Xiang
Chong Li
Biao Wang
Xihan Wei
Xiangpei Hua
Lei Zhang
ViT
30
27
0
27 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
21
268
0
13 Jul 2022
Analysis and Extensions of Adversarial Training for Video Classification
Analysis and Extensions of Adversarial Training for Video Classification
K. A. Kinfu
René Vidal
AAML
33
13
0
16 Jun 2022
PrivHAR: Recognizing Human Actions From Privacy-preserving Lens
PrivHAR: Recognizing Human Actions From Privacy-preserving Lens
Carlos Hinojosa
M. Márquez
Henry Arguello
Ehsan Adeli
L. Fei-Fei
Juan Carlos Niebles
PICV
28
20
0
08 Jun 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
30
0
0
03 May 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
Yang Liu
Y. Tan
Haoyu Lan
SSL
47
6
0
28 Apr 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Yujun Lin
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
27
107
0
25 Apr 2022
123456
Next