ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 675 papers shown
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video
  Person Re-Identification
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-IdentificationComputer Vision and Pattern Recognition (CVPR), 2021
Rui Hou
Hong Chang
Bingpeng Ma
Rui Huang
Shiguang Shan
167
105
0
30 Apr 2021
Three-stream network for enriched Action Recognition
Three-stream network for enriched Action Recognition
Ivaxi Sheth
125
4
0
27 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextNeural Information Processing Systems (NeurIPS), 2021
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
731
679
0
22 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video RetrievalComputer Vision and Pattern Recognition (CVPR), 2021
Xiaohan Wang
Linchao Zhu
Yi Yang
376
213
0
20 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient
  Video Recognition
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Yue Yu
Yu-Gang Jiang
303
5
0
20 Apr 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2021
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
260
98
0
19 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIPVLM
1.5K
1,001
0
18 Apr 2021
Adaptive Intermediate Representations for Video Understanding
Adaptive Intermediate Representations for Video Understanding
Juhana Kangaspunta
A. Piergiovanni
Rico Jonschkowski
Michael S. Ryoo
A. Angelova
157
4
0
14 Apr 2021
Video Question Answering with Phrases via Semantic Roles
Video Question Answering with Phrases via Semantic RolesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Arka Sadhu
Kan Chen
Ram Nevatia
177
16
0
08 Apr 2021
Progressive Temporal Feature Alignment Network for Video Inpainting
Progressive Temporal Feature Alignment Network for Video InpaintingComputer Vision and Pattern Recognition (CVPR), 2021
Xueyan Zou
Linjie Yang
Ding Liu
Yong Jae Lee
166
60
0
08 Apr 2021
ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal
  Action Localization
ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal Action Localization
Sanqing Qu
Guang Chen
Zhijun Li
Lijun Zhang
Fan Lu
Alois C. Knoll
143
61
0
07 Apr 2021
CCSNet: a deep learning modeling suite for CO$_2$ storage
CCSNet: a deep learning modeling suite for CO2_22​ storage
Gege Wen
C. Hay
S. Benson
241
93
0
05 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalIEEE International Conference on Computer Vision (ICCV), 2021
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
838
1,452
0
01 Apr 2021
Adaptive Configuration of In Situ Lossy Compression for Cosmology
  Simulations via Fine-Grained Rate-Quality Modeling
Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality ModelingIEEE International Symposium on High-Performance Parallel Distributed Computing (HPDC), 2021
Sian Jin
Jesus Pulido
Pascal Grosset
Jiannan Tian
Dingwen Tao
J. Ahrens
145
10
0
01 Apr 2021
Rethinking Self-supervised Correspondence Learning: A Video Frame-level
  Similarity Perspective
Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity PerspectiveIEEE International Conference on Computer Vision (ICCV), 2021
Jiarui Xu
Xiaolong Wang
VOS
385
108
0
31 Mar 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video LearningIEEE International Conference on Computer Vision (ICCV), 2021
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSLAI4TS
297
138
0
30 Mar 2021
Augmented Transformer with Adaptive Graph for Temporal Action Proposal
  Generation
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation
Shuning Chang
Pichao Wang
F. Wang
Hao Li
Jiashi Feng
ViT
223
46
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision TransformerIEEE International Conference on Computer Vision (ICCV), 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
553
2,708
0
29 Mar 2021
Busy-Quiet Video Disentangling for Video Classification
Busy-Quiet Video Disentangling for Video ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Guoxi Huang
A. Bors
275
10
0
29 Mar 2021
No frame left behind: Full Video Action Recognition
No frame left behind: Full Video Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2021
X. Liu
S. Pintea
Fatemeh Karimi Nejadasl
Olaf Booij
Jan van Gemert
251
45
0
29 Mar 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text
  Retrieval
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text RetrievalIEEE International Conference on Computer Vision (ICCV), 2021
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
339
165
0
28 Mar 2021
Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical
  Image Segmentation Using Deep Neural Networks: Past, Present, & Future
Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical Image Segmentation Using Deep Neural Networks: Past, Present, & Future
Teofilo E. Zosa
OOD
203
0
0
27 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text ProblemArtificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
269
18
0
27 Mar 2021
Learning Comprehensive Motion Representation for Action Recognition
Learning Comprehensive Motion Representation for Action RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2021
Mingyu Wu
Boyuan Jiang
Donghao Luo
Junchi Yan
Yabiao Wang
Ying Tai
Chengjie Wang
Jilin Li
Feiyue Huang
Xiaokang Yang
109
12
0
23 Mar 2021
AdaSGN: Adapting Joint Number and Model Size for Efficient
  Skeleton-Based Action Recognition
AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2021
Lei Shi
Yifan Zhang
Jian Cheng
Hanqing Lu
162
56
0
22 Mar 2021
Efficient Spatialtemporal Context Modeling for Action Recognition
Efficient Spatialtemporal Context Modeling for Action RecognitionNeurocomputing (Neurocomputing), 2021
Congqi Cao
Yue Lu
Yifan Zhang
Dengyang Jiang
Yanning Zhang
245
6
0
20 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
221
148
0
19 Mar 2021
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex
  Action Recognition
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition
Pengzhen Ren
Gang Xiao
Xiaojun Chang
Yun Xiao
Zhihui Li
Xiaojiang Chen
ViT
162
6
0
17 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and VisionInternational Journal of Computer Vision (IJCV), 2021
Andrew Shin
Masato Ishii
T. Narihira
289
50
0
06 Mar 2021
Unsupervised Motion Representation Enhanced Network for Action
  Recognition
Unsupervised Motion Representation Enhanced Network for Action RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Xiaohang Yang
Lingtong Kong
Jie Yang
149
4
0
05 Mar 2021
VA-RED$^2$: Video Adaptive Redundancy Reduction
VA-RED2^22: Video Adaptive Redundancy ReductionInternational Conference on Learning Representations (ICLR), 2021
Bowen Pan
Yikang Shen
Camilo Luciano Fosco
Chung-Ching Lin
A. Andonian
Yue Meng
Kate Saenko
A. Oliva
Rogerio Feris
287
19
0
15 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse SamplingComputer Vision and Pattern Recognition (CVPR), 2021
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
458
748
0
11 Feb 2021
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
  Recognition
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action RecognitionInternational Conference on Learning Representations (ICLR), 2021
Yue Meng
Yikang Shen
Chung-Ching Lin
P. Sattigeri
Leonid Karlinsky
Kate Saenko
A. Oliva
Rogerio Feris
292
70
0
10 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?International Conference on Machine Learning (ICML), 2021
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
1.1K
2,648
0
09 Feb 2021
Bridging the gap between Human Action Recognition and Online Action
  Detection
Bridging the gap between Human Action Recognition and Online Action Detection
Alban Main De Boissiere
R. Noumeir
189
0
0
21 Jan 2021
Few-shot Action Recognition with Prototype-centered Attentive Learning
Few-shot Action Recognition with Prototype-centered Attentive LearningBritish Machine Vision Conference (BMVC), 2021
Xiatian Zhu
Antoine Toisoul
Juan-Manuel Prez-Ra
Li Zhang
Brais Martínez
Tao Xiang
215
57
0
20 Jan 2021
TCLR: Temporal Contrastive Learning for Video Representation
TCLR: Temporal Contrastive Learning for Video RepresentationComputer Vision and Image Understanding (CVIU), 2021
I. Dave
Rohit Gupta
Mamshad Nayeem Rizve
Mubarak Shah
SSLAI4TS
381
207
0
20 Jan 2021
3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral
  Image Classification
3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral Image ClassificationIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2021
Haokui Zhang
Chengrong Gong
Yunpeng Bai
Zongwen Bai
Ying Li
173
32
0
12 Jan 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Learning from Weakly-labeled Web Videos via Exploring Sub-ConceptsAAAI Conference on Artificial Intelligence (AAAI), 2021
Kunpeng Li
Zizhao Zhang
Guanhang Wu
Xuehan Xiong
Chen-Yu Lee
Zhichao Lu
Y. Fu
Tomas Pfister
138
5
0
11 Jan 2021
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video
  Recognition
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video RecognitionComputer Vision and Pattern Recognition (CVPR), 2020
Hengduo Li
Zuxuan Wu
Abhinav Shrivastava
L. Davis
279
34
0
29 Dec 2020
Global Context Networks
Global Context NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Yue Cao
Jiarui Xu
Stephen Lin
Fangyun Wei
Han Hu
ISeg
230
128
0
24 Dec 2020
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
584
707
0
22 Dec 2020
TDN: Temporal Difference Networks for Efficient Action Recognition
TDN: Temporal Difference Networks for Efficient Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020
Limin Wang
Zhan Tong
Bin Ji
Gangshan Wu
450
462
0
18 Dec 2020
Multi-shot Temporal Event Localization: a Benchmark
Multi-shot Temporal Event Localization: a BenchmarkComputer Vision and Pattern Recognition (CVPR), 2020
Xiaolong Liu
Yao Hu
S. Bai
Fei Ding
X. Bai
Juil Sock
204
97
0
17 Dec 2020
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
FLAVR: Flow-Agnostic Video Representations for Fast Frame InterpolationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Tarun Kalluri
Deepak Pathak
Manmohan Chandraker
Du Tran
VGen
399
174
0
15 Dec 2020
GTA: Global Temporal Attention for Video Action Understanding
GTA: Global Temporal Attention for Video Action UnderstandingBritish Machine Vision Conference (BMVC), 2020
Bo He
Xitong Yang
Zuxuan Wu
Hao Chen
Ser-Nam Lim
Abhinav Shrivastava
ViT
183
27
0
15 Dec 2020
NUTA: Non-uniform Temporal Aggregation for Action Recognition
NUTA: Non-uniform Temporal Aggregation for Action RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Hao Chen
Joseph Tighe
ViT
120
17
0
15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLMAI4TS
283
210
0
11 Dec 2020
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency
  Prediction
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency PredictionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
ViT
309
82
0
11 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized UtterancesComputer Vision and Pattern Recognition (CVPR), 2020
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
312
71
0
10 Dec 2020
Previous
123...10111213149
Next