ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 675 papers shown
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization
  for Efficient Video Classification
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video ClassificationIEEE Access (IEEE Access), 2020
Youngwan Lee
Hyungil Kim
Kimin Yun
Jinyoung Moon
257
13
0
01 Dec 2020
Recent Progress in Appearance-based Action Recognition
Recent Progress in Appearance-based Action Recognition
J. Humphreys
Zhe Chen
Dacheng Tao
170
0
0
25 Nov 2020
A3D: Adaptive 3D Networks for Video Action Recognition
A3D: Adaptive 3D Networks for Video Action Recognition
Sijie Zhu
Taojiannan Yang
Matías Mendieta
Chong Chen
3DH
176
13
0
24 Nov 2020
Play Fair: Frame Attributions in Video Models
Play Fair: Frame Attributions in Video ModelsAsian Conference on Computer Vision (ACCV), 2020
Will Price
Dima Damen
FAtt
119
6
0
24 Nov 2020
QuerYD: A video dataset with high-quality text and audio narrations
QuerYD: A video dataset with high-quality text and audio narrations
Andreea-Maria Oncescu
João F. Henriques
Yang Liu
Andrew Zisserman
Samuel Albanie
VGen
172
12
0
22 Nov 2020
We don't Need Thousand Proposals$\colon$ Single Shot Actor-Action
  Detection in Videos
We don't Need Thousand Proposals ⁣:\colon: Single Shot Actor-Action Detection in VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
A. J. Rana
Yogesh S Rawat
ViT
137
12
0
22 Nov 2020
3D CNNs with Adaptive Temporal Feature Resolutions
3D CNNs with Adaptive Temporal Feature Resolutions
Mohsen Fayyaz
Emad Bahrami Rad
Ali Diba
M. Noroozi
Ehsan Adeli
Luc Van Gool
Juergen Gall
3DPC
222
39
0
17 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text RepresentationsComputer Vision and Pattern Recognition (CVPR), 2020
Linchao Zhu
Yi Yang
ViT
324
451
0
14 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
180
101
0
10 Nov 2020
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial
  Expression Recognition
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition
T. Ayral
M. Pedersoli
Simon L Bacon
Mohammadhadi Shateri
CVBM3DH
171
14
0
10 Nov 2020
Mutual Modality Learning for Video Action Classification
Mutual Modality Learning for Video Action Classification
Stepan Alekseevich Komkov
Maksim Dzabraev
Aleksandr Petiushko
158
9
0
04 Nov 2020
PV-NAS: Practical Neural Architecture Search for Video Recognition
PV-NAS: Practical Neural Architecture Search for Video Recognition
Zihao Wang
Chen Lin
Lu Sheng
Junjie Yan
Jing Shao
ViT
304
7
0
02 Nov 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised
  Video Representation Leaning
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning
L. Tao
Xueting Wang
T. Yamasaki
VLMSSL
249
14
0
29 Oct 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action
  Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Yikang Shen
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
291
116
0
22 Oct 2020
Pose And Joint-Aware Action Recognition
Pose And Joint-Aware Action Recognition
Anshul B. Shah
Shlok Kumar Mishra
Ankan Bansal
Jun-Cheng Chen
Ramalingam Chellappa
Abhinav Shrivastava
328
36
0
16 Oct 2020
Back to the Future: Cycle Encoding Prediction for Self-supervised
  Contrastive Video Representation Learning
Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning
Xinyu Yang
Majid Mirmehdi
T. Burghardt
391
4
0
14 Oct 2020
Boosting Continuous Sign Language Recognition via Cross Modality
  Augmentation
Boosting Continuous Sign Language Recognition via Cross Modality AugmentationACM Multimedia (ACM MM), 2020
Junfu Pu
Wen-gang Zhou
Hezhen Hu
Houqiang Li
182
126
0
11 Oct 2020
Contrastive Representation Learning: A Framework and Review
Contrastive Representation Learning: A Framework and ReviewIEEE Access (IEEE Access), 2020
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSLAI4TS
588
848
0
10 Oct 2020
Support-set bottlenecks for video-text representation learning
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
342
260
0
06 Oct 2020
Hierarchical Domain-Adapted Feature Learning for Video Saliency
  Prediction
Hierarchical Domain-Adapted Feature Learning for Video Saliency PredictionInternational Journal of Computer Vision (IJCV), 2020
Giovanni Bellitto
Federica Proietto Salanitri
S. Palazzo
Francesco Rundo
Daniela Giordano
C. Spampinato
MDE
363
61
0
02 Oct 2020
PERF-Net: Pose Empowered RGB-Flow Net
PERF-Net: Pose Empowered RGB-Flow NetIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Yinxiao Li
Zhichao Lu
Xuehan Xiong
Jonathan Huang
3DH
272
20
0
28 Sep 2020
On the spatiotemporal behavior in biology-mimicking computing systems
On the spatiotemporal behavior in biology-mimicking computing systems
J. Végh
Ádám-József Berki
134
6
0
18 Sep 2020
Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural
  Networks
Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural NetworksNeural Information Processing Systems (NeurIPS), 2020
Iulia Duta
Andrei Liviu Nicolicioiu
Marius Leordeanu
326
7
0
17 Sep 2020
Multi-Label Activity Recognition using Activity-specific Features and
  Activity Correlations
Multi-Label Activity Recognition using Activity-specific Features and Activity CorrelationsComputer Vision and Pattern Recognition (CVPR), 2020
Yanyi Zhang
Xinyu Li
I. Marsic
HAI
157
28
0
16 Sep 2020
Online Spatiotemporal Action Detection and Prediction via Causal
  Representations
Online Spatiotemporal Action Detection and Prediction via Causal Representations
Gurkirt Singh
3DPCCML
181
0
0
31 Aug 2020
Self-supervised Video Representation Learning by Uncovering
  Spatio-temporal Statistics
Self-supervised Video Representation Learning by Uncovering Spatio-temporal StatisticsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Jiangliu Wang
Jianbo Jiao
Linchao Bao
Shengfeng He
Wei Liu
Yunhui Liu
SSLAI4TS
199
59
0
31 Aug 2020
DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
  and Alertness Analysis
DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis
J. Ortega
Neslihan Köse
P. Cañas
Min-An Chao
A. Unnervik
Marcos Nieto
Oihana Otaegui
L. Salgado
188
120
0
27 Aug 2020
Making a Case for 3D Convolutions for Object Segmentation in Videos
Making a Case for 3D Convolutions for Object Segmentation in VideosBritish Machine Vision Conference (BMVC), 2020
Sabarinath Mahadevan
A. Athar
Aljosa Osep
Sebastian Hennen
Laura Leal-Taixé
Bastian Leibe
VOS
322
96
0
26 Aug 2020
Effective Action Recognition with Embedded Key Point Shifts
Effective Action Recognition with Embedded Key Point ShiftsPattern Recognition (Pattern Recognit.), 2020
Haozhi Cao
Yuecong Xu
Jianfei Yang
K. Mao
Jianxiong Yin
Simon See
147
7
0
26 Aug 2020
Global-local Enhancement Network for NMFs-aware Sign Language
  Recognition
Global-local Enhancement Network for NMFs-aware Sign Language Recognition
Hezhen Hu
Wen-gang Zhou
Junfu Pu
Houqiang Li
SLR
242
65
0
24 Aug 2020
AssembleNet++: Assembling Modality Representations via Attention
  Connections
AssembleNet++: Assembling Modality Representations via Attention Connections
Michael S. Ryoo
A. Piergiovanni
Juhana Kangaspunta
A. Angelova
169
50
0
18 Aug 2020
Self-supervised Video Representation Learning by Pace Prediction
Self-supervised Video Representation Learning by Pace Prediction
Jiangliu Wang
Jianbo Jiao
Yunhui Liu
SSLAI4TS
248
251
0
13 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised
  Audio-Visual Representation Learning
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
269
117
0
13 Aug 2020
TransNet V2: An effective deep network architecture for fast shot
  transition detection
TransNet V2: An effective deep network architecture for fast shot transition detectionACM Multimedia (ACM MM), 2020
Tomás Soucek
Jakub Lokoč
301
181
0
11 Aug 2020
Spatiotemporal Contrastive Video Representation Learning
Spatiotemporal Contrastive Video Representation LearningComputer Vision and Pattern Recognition (CVPR), 2020
Rui Qian
Tianjian Meng
Boqing Gong
Ming-Hsuan Yang
Jian Shu
Serge J. Belongie
Huayu Chen
SSLAI4TS
409
543
0
09 Aug 2020
PAN: Towards Fast Action Recognition via Learning Persistence of
  Appearance
PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
Can Zhang
Yuexian Zou
Guang Chen
Lei Gan
156
45
0
08 Aug 2020
Exploring Relations in Untrimmed Videos for Self-Supervised Learning
Exploring Relations in Untrimmed Videos for Self-Supervised Learning
Dezhao Luo
Bo Fang
Can Ma
Yucan Zhou
Dayan Wu
Weiping Wang
223
23
0
06 Aug 2020
Self-supervised Video Representation Learning Using Inter-intra
  Contrastive Framework
Self-supervised Video Representation Learning Using Inter-intra Contrastive FrameworkACM Multimedia (ACM MM), 2020
Li Tao
Xueting Wang
T. Yamasaki
SSL
336
114
0
06 Aug 2020
Late Temporal Modeling in 3D CNN Architectures with BERT for Action
  Recognition
Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition
M. E. Kalfaoglu
Sinan Kalkan
A. Aydin Alatan
3DPC
198
154
0
03 Aug 2020
Residual Frames with Efficient Pseudo-3D CNN for Human Action
  Recognition
Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition
Jiawei Chen
Jenson Hsiao
C. Ho
200
6
0
03 Aug 2020
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)
Samuel Albanie
Yang Liu
Arsha Nagrani
Antoine Miech
Ernesto Coto
...
Kaixu Cui
Hui Liu
Chen Wang
Yudong Jiang
Xiaoshuai Hao
168
13
0
03 Aug 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Gaowen Liu
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
244
51
0
29 Jul 2020
Approximated Bilinear Modules for Temporal Modeling
Approximated Bilinear Modules for Temporal ModelingIEEE International Conference on Computer Vision (ICCV), 2019
Xinqi Zhu
Chang Xu
Langwen Hui
Cewu Lu
Dacheng Tao
124
27
0
25 Jul 2020
AttentionNAS: Spatiotemporal Attention Cell Search for Video
  Classification
AttentionNAS: Spatiotemporal Attention Cell Search for Video ClassificationEuropean Conference on Computer Vision (ECCV), 2020
Xiaofang Wang
Xuehan Xiong
Maxim Neumann
A. Piergiovanni
Michael S. Ryoo
A. Angelova
Kris Kitani
Wei Hua
294
52
0
23 Jul 2020
Perceptron Synthesis Network: Rethinking the Action Scale Variances in
  Videos
Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos
Yuan Tian
Guangtao Zhai
Zhiyong Gao
157
0
0
22 Jul 2020
Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human
  Action Recognition
Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition
Sudhakar Kumawat
Manisha Verma
Yuta Nakashima
Shanmuganathan Raman
334
49
0
22 Jul 2020
Directional Temporal Modeling for Action Recognition
Directional Temporal Modeling for Action Recognition
Xinyu Li
Bing Shuai
Joseph Tighe
123
47
0
21 Jul 2020
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
1.1K
675
0
21 Jul 2020
Hierarchical Contrastive Motion Learning for Video Action Recognition
Hierarchical Contrastive Motion Learning for Video Action Recognition
Xitong Yang
Xiaodong Yang
Sifei Liu
Deqing Sun
L. Davis
Jan Kautz
SSL
290
15
0
20 Jul 2020
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Heeseung Kwon
Manjin Kim
Suha Kwak
Minsu Cho
FAtt
165
143
0
20 Jul 2020
Previous
123...1011121314
Next