ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 674 papers shown
Title
Pose-guided multi-task video transformer for driver action recognition
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
170
0
0
18 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced
  Synchronicity
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffMVGen
199
26
0
15 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
221
23
0
11 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional
  Action Recognition
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
328
11
0
08 Jul 2024
Open-Event Procedure Planning in Instructional Videos
Open-Event Procedure Planning in Instructional Videos
Yilu Wu
Hanlin Wang
Jing Wang
Limin Wang
239
1
0
06 Jul 2024
IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale
IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale
Wei Gao
Bo Ai
Joel Loo
Vinay
David Hsu
292
3
0
03 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
218
10
0
03 Jul 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video
  Action Recognition
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
187
7
0
21 Jun 2024
Video Frame Interpolation for Polarization via Swin-Transformer
Video Frame Interpolation for Polarization via Swin-Transformer
Feng Huang
Xin Zhang
Yixuan Xu
Xuesong Wang
Xianyu Wu
236
0
0
17 Jun 2024
PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification
PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification
Magdalena Trędowicz
Marcin Mazur
Szymon Janusz
Arkadiusz Lewicki
Jacek Tabor
Łukasz Struski
290
1
0
17 Jun 2024
Self-Supervised Representation Learning with Spatial-Temporal
  Consistency for Sign Language Recognition
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language RecognitionIEEE Transactions on Image Processing (TIP), 2024
Weichao Zhao
Wengang Zhou
Hezhen Hu
Min Wang
Houqiang Li
SLR
270
12
0
15 Jun 2024
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
Stefan Gerd Fritsch
Cennet Oğuz
Vitor Fortes Rey
L. Ray
Maximilian Kiefer-Emmanouilidis
Paul Lukowicz
HAI
407
3
0
06 Jun 2024
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a
  Hybrid Model
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model
Khaled Alomar
Halil Ibrahim Aysel
Xiaohao Cai
MedImViT
252
21
0
02 Jun 2024
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign
  Language Recognition
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Weichao Zhao
Hezhen Hu
Wen-gang Zhou
Yunyao Mao
Min Wang
Houqiang Li
SLR
174
20
0
31 May 2024
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala
Reginald McLean
Isaac Woungang
Nariman Farsad
Samuel Kaski
Pekka Marttinen
Kai Yuan
LM&Ro
303
7
0
30 May 2024
Counterfactual Gradients-based Quantification of Prediction Trust in
  Neural Networks
Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks
Mohit Prabhushankar
Ghassan AlRegib
UQCV
203
0
0
22 May 2024
Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance
Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance
Kaifeng Zhang
Zhao-Heng Yin
Weirui Ye
Yang Gao
375
6
0
22 May 2024
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Rong Gao
Xin Liu
Bohao Xing
Zitong Yu
Björn W. Schuller
Heikki Kälviäinen
394
7
0
21 May 2024
No Time to Waste: Squeeze Time into Channel for Mobile Video
  Understanding
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Yingjie Zhai
Wenshuo Li
Yehui Tang
Xinghao Chen
Yunhe Wang
ViT
203
2
0
14 May 2024
DiffGen: Robot Demonstration Generation via Differentiable Physics
  Simulation, Differentiable Rendering, and Vision-Language Model
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model
Yang Jin
Jun Lv
Shuqiang Jiang
Cewu Lu
270
1
0
12 May 2024
Deep video representation learning: a survey
Deep video representation learning: a survey
Elham Ravanbakhsh
Yongqing Liang
J. Ramanujam
Xin Li
163
5
0
10 May 2024
Multi-Stream Keypoint Attention Network for Sign Language Recognition
  and Translation
Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation
Mo Guan
Yan Wang
Guangkun Ma
Jiarui Liu
Mingzu Sun
SLR
206
13
0
09 May 2024
A Survey on Backbones for Deep Video Action Recognition
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
148
3
0
09 May 2024
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global
  Temporal Defect Based Detection Method
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method
Peisong He
Leyao Zhu
Jiaxing Li
Shiqi Wang
Haoliang Li
EGVM
212
5
0
07 May 2024
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News
Zhe Niu
Ronglai Zuo
Brian Mak
Fangyun Wei
159
6
0
02 May 2024
SFMViT: SlowFast Meet ViT in Chaotic World
SFMViT: SlowFast Meet ViT in Chaotic World
Jiaying Lin
Jiajun Wen
Mengyuan Liu
Jinfu Liu
Baiqiao Yin
Yue Li
ViT
178
1
0
25 Apr 2024
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang
Sule Bai
Guangyi Chen
Lei Chen
Jiwen Lu
Junle Wang
Yansong Tang
235
19
0
22 Apr 2024
STMixer: A One-Stage Sparse Action Detector
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
198
36
0
15 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports
  Videos
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
267
14
0
06 Apr 2024
A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection
A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection
Chih-Chung Hsu
Chia-Ming Lee
Chiang Fan Yang
Yi-Shiuan Chou
Chih-Yu Jiang
Shen-Chieh Tai
Chin-Han Tsai
195
3
0
02 Apr 2024
LORD: Large Models based Opposite Reward Design for Autonomous Driving
LORD: Large Models based Opposite Reward Design for Autonomous Driving
Xin Ye
Feng Tao
Abhirup Mallik
Burhaneddin Yaman
Liu Ren
OffRL
270
7
0
27 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
190
5
0
24 Mar 2024
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent
  Recognition and Out-of-scope Detection in Conversations
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
Hanlei Zhang
Xin Wang
Hua Xu
Qianrui Zhou
Kai Gao
Jianhua Su
jinyue Zhao
Wenrui Li
Yanting Chen
510
20
0
16 Mar 2024
RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training
RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-trainingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Zhixiu Lu
Hailong Li
N. Parikh
Jonathan R. Dillman
Lili He
MedImVLM
394
5
0
15 Mar 2024
On the Utility of 3D Hand Poses for Action Recognition
On the Utility of 3D Hand Poses for Action RecognitionEuropean Conference on Computer Vision (ECCV), 2024
Md Salman Shamil
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
191
10
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
233
122
0
14 Mar 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained
  Models for Spatiotemporal Modeling
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLMVLM
228
2
0
11 Mar 2024
A spatiotemporal style transfer algorithm for dynamic visual stimulus
  generation
A spatiotemporal style transfer algorithm for dynamic visual stimulus generationNature Computational Science (Nat. Comput. Sci.), 2024
Antonino Greco
Markus Siegel
193
7
0
07 Mar 2024
DiffSal: Joint Audio and Video Learning for Diffusion Saliency
  Prediction
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Jun Xiong
Peng Zhang
Tao You
Chuanyue Li
Wei Huang
Yufei Zha
DiffM
190
13
0
02 Mar 2024
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive
  Reasoning through Theory of Mind
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of MindAAAI Conference on Artificial Intelligence (AAAI), 2024
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
221
6
0
12 Feb 2024
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model
  Feedback
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model FeedbackInternational Conference on Machine Learning (ICML), 2024
Yufei Wang
Zhanyi Sun
Jesse Zhang
Zhou Xian
Erdem Biyik
David Held
Zackory M. Erickson
VLM
293
104
0
06 Feb 2024
SNP-S3: Shared Network Pre-training and Significant Semantic
  Strengthening for Various Video-Text Tasks
SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks
Xingning Dong
Qingpei Guo
Tian Gan
Qing Wang
Yue Yu
Xiangyuan Ren
Yuan Cheng
Wei Chu
206
6
0
31 Jan 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
384
14
0
29 Jan 2024
Synchformer: Efficient Synchronization from Sparse Cues
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
215
50
0
29 Jan 2024
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity SensingEuropean Conference on Computer Vision (ECCV), 2024
Shuokang Huang
Kaihan Li
Di You
Yichong Chen
Arvin Lin
Siying Liu
Xiaohui Li
Julie A. McCann
190
25
0
24 Jan 2024
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by
  Visual-Textual Contrastive Learning
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive LearningBritish Machine Vision Conference (BMVC), 2024
Hao Chen
Jiaze Wang
Ziyu Guo
Jinpeng Li
Donghao Zhou
Bian Wu
Chenyong Guan
Guangyong Chen
Pheng-Ann Heng
233
9
0
22 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
369
9
0
18 Jan 2024
Transformer-based Video Saliency Prediction with High Temporal Dimension
  Decoding
Transformer-based Video Saliency Prediction with High Temporal Dimension Decoding
Morteza Moradi
S. Palazzo
C. Spampinato
183
7
0
15 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Collaboratively Self-supervised Video Representation Learning for Action RecognitionIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
339
2
0
15 Jan 2024
SnapCap: Efficient Snapshot Compressive Video Captioning
SnapCap: Efficient Snapshot Compressive Video Captioning
Jianqiao Sun
Yudi Su
Hao Zhang
Ziheng Cheng
Zequn Zeng
Zhengjue Wang
Bo Chen
Xin Yuan
358
2
0
10 Jan 2024
Previous
12345...121314
Next