ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 675 papers shown
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real
  Recognition of Activities of Daily Living
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily LivingIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Zdravko Marinov
David Schneider
Alina Roitberg
Rainer Stiefelhagen
VGen
208
3
0
03 Aug 2022
Two-Stream Transformer Architecture for Long Video Understanding
Two-Stream Transformer Architecture for Long Video UnderstandingBritish Machine Vision Conference (BMVC), 2022
Edward Fish
Jon Weinbren
Andrew Gilbert
ViT
97
10
0
02 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-TokenizationEuropean Conference on Computer Vision (ECCV), 2022
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
236
21
0
01 Aug 2022
Static and Dynamic Concepts for Self-supervised Video Representation
  Learning
Static and Dynamic Concepts for Self-supervised Video Representation LearningEuropean Conference on Computer Vision (ECCV), 2022
Rui Qian
Shuangrui Ding
Xian Liu
Dahua Lin
SSL
176
26
0
26 Jul 2022
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table
  Tennis Match Broadcasting Videos
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos
Jiang Bian
Xuhong Li
Tao Wang
Qingzhong Wang
Jun Huang
Chen Liu
Jun Zhao
Feixiang Lu
Dejing Dou
Haoyi Xiong
189
17
0
26 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 DatasetEuropean Conference on Computer Vision (ECCV), 2022
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
198
14
0
21 Jul 2022
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
NSNet: Non-saliency Suppression Sampler for Efficient Video RecognitionEuropean Conference on Computer Vision (ECCV), 2022
Boyang Xia
Wenhao Wu
Haoran Wang
Rui Su
Dongliang He
Haosen Yang
Xiaoran Fan
Wanli Ouyang
225
24
0
21 Jul 2022
Temporal Saliency Query Network for Efficient Video Recognition
Temporal Saliency Query Network for Efficient Video RecognitionEuropean Conference on Computer Vision (ECCV), 2022
Boyang Xia
Zhihao Wang
Wenhao Wu
Haoran Wang
Jungong Han
220
19
0
21 Jul 2022
GOCA: Guided Online Cluster Assignment for Self-Supervised Video
  Representation Learning
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation LearningEuropean Conference on Computer Vision (ECCV), 2022
Huseyin Coskun
Alireza Zareian
Joshua L. Moore
F. Tombari
Chen Wang
SSL
193
3
0
20 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?Asian Conference on Computer Vision (ACCV), 2022
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
346
31
0
20 Jul 2022
ViGAT: Bottom-up event recognition and explanation in video using
  factorized graph attention network
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention networkIEEE Access (IEEE Access), 2022
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
202
12
0
20 Jul 2022
Learning Sequence Representations by Non-local Recurrent Neural Memory
Learning Sequence Representations by Non-local Recurrent Neural MemoryInternational Journal of Computer Vision (IJCV), 2022
Wenjie Pei
Xin Feng
Canmiao Fu
Qi Cao
Guangming Lu
Yu-Wing Tai
AI4TS
286
0
0
20 Jul 2022
ERA: Expert Retrieval and Assembly for Early Action Prediction
ERA: Expert Retrieval and Assembly for Early Action PredictionEuropean Conference on Computer Vision (ECCV), 2022
Lin Geng Foo
Tianjiao Li
Hossein Rahmani
Qiuhong Ke
Jing Liu
287
22
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
SVGraph: Learning Semantic Graphs from Instructional VideosIEEE International Conference on Multimedia Big Data (ICMBD), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
238
5
0
16 Jul 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
TS2-Net: Token Shift and Selection Transformer for Text-Video RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
261
169
0
16 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision
  and Language Models
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming-Hsuan Yang
Serge Belongie
Huayu Chen
VLM
172
26
0
15 Jul 2022
Long-term Leap Attention, Short-term Periodic Shift for Video
  Classification
Long-term Leap Attention, Short-term Periodic Shift for Video ClassificationACM Multimedia (ACM MM), 2022
Huatian Zhang
Lechao Cheng
Y. Hao
Chong-Wah Ngo
ViT
174
11
0
12 Jul 2022
Video Graph Transformer for Video Question Answering
Video Graph Transformer for Video Question AnsweringEuropean Conference on Computer Vision (ECCV), 2022
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
492
94
0
12 Jul 2022
Dual Contrastive Learning for Spatio-temporal Representation
Dual Contrastive Learning for Spatio-temporal RepresentationACM Multimedia (ACM MM), 2022
Shuangrui Ding
Rui Qian
H. Xiong
AI4TSSSL
150
25
0
12 Jul 2022
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
Chuong H. Nguyen
Su Huynh
Vinh Nguyen
Ngoc-Khanh Nguyen
ViT
179
3
0
08 Jul 2022
Video Dialog as Conversation about Objects Living in Space-Time
Video Dialog as Conversation about Objects Living in Space-TimeEuropean Conference on Computer Vision (ECCV), 2022
H. Pham
T. Le
Vuong Le
Tu Minh Phuong
T. Tran
209
14
0
08 Jul 2022
Robustness Analysis of Video-Language Models Against Visual and Language
  Perturbations
Robustness Analysis of Video-Language Models Against Visual and Language PerturbationsNeural Information Processing Systems (NeurIPS), 2022
Madeline Chantry Schiappa
Shruti Vyas
Hamid Palangi
Yogesh S Rawat
Vibhav Vineet
VLM
581
29
0
05 Jul 2022
Large-scale Robustness Analysis of Video Action Recognition Models
Large-scale Robustness Analysis of Video Action Recognition ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Madeline Chantry Schiappa
Naman Biyani
Prudvi Kamtam
Shruti Vyas
Hamid Palangi
Vibhav Vineet
Yogesh S Rawat
AAML
272
36
0
04 Jul 2022
GraphVid: It Only Takes a Few Nodes to Understand a Video
GraphVid: It Only Takes a Few Nodes to Understand a VideoEuropean Conference on Computer Vision (ECCV), 2022
Eitan Kosman
Dotan Di Castro
GNN
234
5
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video
  Recognition
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2022
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
394
125
0
04 Jul 2022
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Exploring Temporally Dynamic Data Augmentation for Video RecognitionInternational Conference on Learning Representations (ICLR), 2022
Taeoh Kim
Jinhyung Kim
Minho Shim
Sangdoo Yun
Myunggu Kang
Dongyoon Wee
Sangyoun Lee
AI4TS
219
17
0
30 Jun 2022
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningNeural Information Processing Systems (NeurIPS), 2022
Junting Pan
Ziyi Lin
Xiatian Zhu
Jing Shao
Jiaming Song
364
264
0
27 Jun 2022
SLIC: Self-Supervised Learning with Iterative Clustering for Human
  Action Videos
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action VideosComputer Vision and Pattern Recognition (CVPR), 2022
S. H. Khorasgani
Yuxuan Chen
Florian Shkurti
SSL
210
31
0
25 Jun 2022
Explore Spatio-temporal Aggregation for Insubstantial Object Detection:
  Benchmark Dataset and Baseline
Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and BaselineComputer Vision and Pattern Recognition (CVPR), 2022
Kailai Zhou
Yibo Wang
Tao Lv
Yunqian Li
Linsen Chen
Qiu Shen
Xun Cao
206
18
0
23 Jun 2022
Symmetric Network with Spatial Relationship Modeling for Natural
  Language-based Vehicle Retrieval
Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle Retrieval
Chuyang Zhao
Haobo Chen
Wenyuan Zhang
Junru Chen
Sipeng Zhang
Yadong Li
Boxun Li
146
13
0
22 Jun 2022
Bi-Calibration Networks for Weakly-Supervised Video Representation
  Learning
Bi-Calibration Networks for Weakly-Supervised Video Representation LearningInternational Journal of Computer Vision (IJCV), 2022
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
252
9
0
21 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A SurveyACM Computing Surveys (ACM CSUR), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
474
166
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
  Knowledge
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale KnowledgeNeural Information Processing Systems (NeurIPS), 2022
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
465
493
0
17 Jun 2022
Stand-Alone Inter-Frame Attention in Video Models
Stand-Alone Inter-Frame Attention in Video ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Jiebo Luo
Tao Mei
ViT
173
59
0
14 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
MLP-3D: A MLP-like 3D Architecture with Grouped Time MixingComputer Vision and Pattern Recognition (CVPR), 2022
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
203
17
0
13 Jun 2022
Words are all you need? Language as an approximation for human
  similarity judgments
Words are all you need? Language as an approximation for human similarity judgmentsInternational Conference on Learning Representations (ICLR), 2022
Raja Marjieh
Pol van Rijn
Ilia Sucholutsky
T. Sumers
Harin Lee
Thomas Griffiths
Nori Jacoby
258
22
0
08 Jun 2022
Egocentric Video-Language Pretraining
Egocentric Video-Language PretrainingNeural Information Processing Systems (NeurIPS), 2022
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLMEgoV
268
247
0
03 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and
  Applications
A Survey on Video Action Recognition in Sports: Datasets, Methods and ApplicationsIEEE transactions on multimedia (IEEE TMM), 2022
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
244
85
0
02 Jun 2022
Deep Posterior Distribution-based Embedding for Hyperspectral Image
  Super-resolution
Deep Posterior Distribution-based Embedding for Hyperspectral Image Super-resolutionIEEE Transactions on Image Processing (IEEE TIP), 2022
Jinhui Hou
Zhiyu Zhu
Xianqiang Lyu
Huanqiang Zeng
Jinjian Wu
Jiantao Zhou
SupR
215
22
0
30 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web VideosIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
314
39
0
10 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionComputer Vision and Image Understanding (CVIU), 2022
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
269
52
0
05 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
In Defense of Image Pre-Training for Spatiotemporal RecognitionEuropean Conference on Computer Vision (ECCV), 2022
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
155
1
0
03 May 2022
Preserve Pre-trained Knowledge: Transfer Learning With Self-Distillation
  For Action Recognition
Preserve Pre-trained Knowledge: Transfer Learning With Self-Distillation For Action Recognition
Yang Zhou
Zhanhao He
Ke Lu
Guanhong Wang
Gaoang Wang
CLLSLR
307
3
0
01 May 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
157
48
0
26 Apr 2022
Contrastive Language-Action Pre-training for Temporal Localization
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
186
25
0
26 Apr 2022
Temporal Relevance Analysis for Video Action Models
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
161
1
0
25 Apr 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
254
133
0
25 Apr 2022
Attention in Attention: Modeling Context Correlation for Efficient Video
  Classification
Attention in Attention: Modeling Context Correlation for Efficient Video Classification
Y. Hao
Shuo Wang
P. Cao
Xinjian Gao
Tong Xu
Jinmeng Wu
Xiangnan He
179
50
0
20 Apr 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation
Temporally Efficient Vision Transformer for Video Instance SegmentationComputer Vision and Pattern Recognition (CVPR), 2022
Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
ViT
184
77
0
18 Apr 2022
Video Action Detection: Analysing Limitations and Challenges
Video Action Detection: Analysing Limitations and Challenges
Rajat Modi
A. J. Rana
Akash Kumar
Praveen Tirupattur
Shruti Vyas
Yogesh S Rawat
M. Shah
207
15
0
17 Apr 2022
Previous
123...567...121314
Next