ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 675 papers shown
Masked Autoencoder for Unsupervised Video Summarization
Masked Autoencoder for Unsupervised Video Summarization
Minho Shim
Taeoh Kim
Jinhyung Kim
Dongyoon Wee
176
3
0
02 Jun 2023
Discovering Novel Actions from Open World Egocentric Videos with
  Object-Grounded Visual Commonsense Reasoning
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense ReasoningEuropean Conference on Computer Vision (ECCV), 2023
Sanjoy Kundu
Shubham Trehan
Sathyanarayanan N. Aakur
LRMLM&Ro
311
5
0
26 May 2023
Cross-view Action Recognition Understanding From Exocentric to
  Egocentric Perspective
Cross-view Action Recognition Understanding From Exocentric to Egocentric PerspectiveNeurocomputing (Neurocomputing), 2023
Thanh-Dat Truong
Khoa Luu
EgoV
393
15
0
25 May 2023
TG-VQA: Ternary Game of Video Question Answering
TG-VQA: Ternary Game of Video Question AnsweringInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Hao Li
Peng Jin
Ze-Long Cheng
Songyang Zhang
Kai-xiang Chen
Zhennan Wang
Chang-rui Liu
Jie Chen
241
12
0
17 May 2023
Lightweight Delivery Detection on Doorbell Cameras
Lightweight Delivery Detection on Doorbell CamerasIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Pirazh Khorramshahi
Zhe Wu
Tianchen Wang
Luke Deluccia
Hongcheng Wang
199
0
0
13 May 2023
Visual Tuning
Visual TuningACM Computing Surveys (ACM Comput. Surv.), 2023
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
440
60
0
10 May 2023
Improve Video Representation with Temporal Adversarial Augmentation
Improve Video Representation with Temporal Adversarial AugmentationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Jinhao Duan
Quanfu Fan
Hao-Ran Cheng
Xiaoshuang Shi
Kaidi Xu
AAMLAI4TSViT
244
3
0
28 Apr 2023
SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow
  Estimation
SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow EstimationNeurocomputing (Neurocomputing), 2023
Fisseha Admasu Ferede
M. Balasubramanian
135
4
0
26 Apr 2023
MRSN: Multi-Relation Support Network for Video Action Detection
MRSN: Multi-Relation Support Network for Video Action DetectionIEEE International Conference on Multimedia and Expo (ICME), 2023
Yin-Dong Zheng
Guo Chen
Minglei Yuan
Tong Lu
272
10
0
24 Apr 2023
Implicit Temporal Modeling with Learnable Alignment for Video
  Recognition
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
S. Tu
Jingdong Sun
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
313
59
0
20 Apr 2023
Pretrained Language Models as Visual Planners for Human Assistance
Pretrained Language Models as Visual Planners for Human AssistanceIEEE International Conference on Computer Vision (ICCV), 2023
Dhruvesh Patel
H. Eghbalzadeh
Nitin Kamra
Michael L. Iuzzolino
Unnat Jain
Ruta Desai
LM&Ro
355
35
0
17 Apr 2023
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak SupervisionInternational Conference on Learning Representations (ICLR), 2023
Jiani Huang
Ziyang Li
Mayur Naik
Ser-Nam Lim
678
9
0
15 Apr 2023
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality
  Assessment
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment
Kai Zhao
Kun Yuan
Ming-Ting Sun
Xingsen Wen
184
29
0
13 Apr 2023
AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary
  Detection
AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection
Wentao Zhu
Yufang Huang
Xi Xie
Wenxian Liu
Jincan Deng
Debing Zhang
Zinan Lin
Ji Liu
320
22
0
12 Apr 2023
Scallop: A Language for Neurosymbolic Programming
Scallop: A Language for Neurosymbolic Programming
Ziyang Li
Jiani Huang
Mayur Naik
ReLMLRMNAI
218
58
0
10 Apr 2023
Hyperspectral Image Super-Resolution via Dual-domain Network Based on
  Hybrid Convolution
Hyperspectral Image Super-Resolution via Dual-domain Network Based on Hybrid ConvolutionIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Tingting Liu
Yuan Liu
Chun-liang Zhang
Liyin Yuan
Xiubao Sui
Qian Chen
SupR
546
52
0
10 Apr 2023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
SparseFormer: Sparse Visual Recognition via Limited Latent TokensInternational Conference on Learning Representations (ICLR), 2023
Ziteng Gao
Zhan Tong
Limin Wang
Mike Zheng Shou
181
16
0
07 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal PromptingComputer Vision and Pattern Recognition (CVPR), 2023
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLMVPVLM
232
112
0
06 Apr 2023
Sketch-based Video Object Localization
Sketch-based Video Object LocalizationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sangmin Woo
So-Yeong Jeon
Jinyoung Park
Minji Son
Sumin Lee
Changick Kim
438
0
0
02 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
194
6
0
01 Apr 2023
Learning Procedure-aware Video Representation from Instructional Videos
  and Their Narrations
Learning Procedure-aware Video Representation from Instructional Videos and Their NarrationsComputer Vision and Pattern Recognition (CVPR), 2023
Yiwu Zhong
Licheng Yu
Yang Bai
Shangwen Li
Xueting Yan
Yin Li
AI4TS
253
47
0
31 Mar 2023
Streaming Video Model
Streaming Video ModelComputer Vision and Pattern Recognition (CVPR), 2023
Yucheng Zhao
Chong Luo
Chuanxin Tang
DongDong Chen
Noel Codella
Zhengjun Zha
252
20
0
30 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated InstructionsComputer Vision and Pattern Recognition (CVPR), 2023
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
363
9
0
29 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action
  Detection
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
195
3
0
28 Mar 2023
Unified Keypoint-based Action Recognition Framework via Structured
  Keypoint Pooling
Unified Keypoint-based Action Recognition Framework via Structured Keypoint PoolingComputer Vision and Pattern Recognition (CVPR), 2023
Ryo Hachiuma
Fumiaki Sato
Taiki Sekii
3DPC
220
47
0
27 Mar 2023
Learning Action Changes by Measuring Verb-Adverb Textual Relationships
Learning Action Changes by Measuring Verb-Adverb Textual RelationshipsComputer Vision and Pattern Recognition (CVPR), 2023
Davide Moltisanti
Frank Keller
Hakan Bilen
Laura Sevilla-Lara
309
8
0
27 Mar 2023
A Large-scale Study of Spatiotemporal Representation Learning with a New
  Benchmark on Action Recognition
A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Andong Deng
Taojiannan Yang
Chong Chen
AI4TS
244
18
0
23 Mar 2023
Natural Language-Assisted Sign Language Recognition
Natural Language-Assisted Sign Language RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Ronglai Zuo
Fangyun Wei
Brian Mak
SLR
230
81
0
21 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Tubelet-Contrastive Self-Supervision for Video-Efficient GeneralizationIEEE International Conference on Computer Vision (ICCV), 2023
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
348
12
0
20 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Dual-path Adaptation from Image to Video TransformersComputer Vision and Pattern Recognition (CVPR), 2023
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
ViT
250
57
0
17 Mar 2023
Video Action Recognition with Attentive Semantic Units
Video Action Recognition with Attentive Semantic UnitsIEEE International Conference on Computer Vision (ICCV), 2023
Yifei Chen
Dapeng Chen
Ruijin Liu
Hao Li
Wei Peng
223
17
0
17 Mar 2023
CASP-Net: Rethinking Video Saliency Prediction from an
  Audio-VisualConsistency Perceptual Perspective
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual PerspectiveComputer Vision and Pattern Recognition (CVPR), 2023
Jun Xiong
Gang Wang
Peng Zhang
Wei Huang
Yufei Zha
Guangtao Zhai
164
19
0
11 Mar 2023
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
  Questions
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test Questions
He Zhu
Xihua Li
Xuemin Zhao
Yunbo Cao
Shan Yu
154
0
0
09 Mar 2023
Improving Video Retrieval by Adaptive Margin
Improving Video Retrieval by Adaptive MarginAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
Feng He
Qi Wang
Zhifan Feng
Wenbin Jiang
Yajuan Lü
Yong Zhu
Xiao Tan
295
24
0
09 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Text-Visual Prompting for Efficient 2D Temporal Video GroundingComputer Vision and Pattern Recognition (CVPR), 2023
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
274
31
0
09 Mar 2023
Continuity-Aware Latent Interframe Information Mining for Reliable UAV
  Tracking
Continuity-Aware Latent Interframe Information Mining for Reliable UAV TrackingIEEE International Conference on Robotics and Automation (ICRA), 2023
Changhong Fu
Mutian Cai
Sihang Li
Kunhan Lu
Haobo Zuo
Chongjun Liu
250
8
0
08 Mar 2023
Continuous Sign Language Recognition with Correlation Network
Continuous Sign Language Recognition with Correlation NetworkComputer Vision and Pattern Recognition (CVPR), 2023
Lianyu Hu
Liqing Gao
Zekang Liu
Wei Feng
SLR
363
116
0
06 Mar 2023
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video
  Recognition
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video RecognitionInternational Conference on Learning Representations (ICLR), 2023
Junyan Wang
Zhenhong Sun
Yichen Qian
Dong Gong
Xiuyu Sun
Ming Lin
Maurice Pagnucco
Yang Song
3DPC
199
14
0
05 Mar 2023
Temporal Coherent Test-Time Optimization for Robust Video Classification
Temporal Coherent Test-Time Optimization for Robust Video ClassificationInternational Conference on Learning Representations (ICLR), 2023
Chenyu Yi
Siyuan Yang
Yufei Wang
Haoliang Li
Yap-Peng Tan
Alex C. Kot
TTA
218
16
0
28 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Contrastive Video Question Answering via Video Graph TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
252
52
0
27 Feb 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a ReviewInternational Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
230
31
0
24 Feb 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for
  Video-Language Pre-training
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-trainingAAAI Conference on Artificial Intelligence (AAAI), 2023
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
390
9
0
20 Feb 2023
Video Action Recognition Collaborative Learning with Dynamics via
  PSO-ConvNet Transformer
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet TransformerScientific Reports (Sci Rep), 2023
N. H. Phong
B. Ribeiro
283
22
0
17 Feb 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an
  endoscopic vision challenge for surgical action triplet detection
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
256
36
0
13 Feb 2023
Efficient End-to-End Video Question Answering with Pyramidal Multimodal
  Transformer
Efficient End-to-End Video Question Answering with Pyramidal Multimodal TransformerAAAI Conference on Artificial Intelligence (AAAI), 2023
Min Peng
Chongyang Wang
Yu Shi
Xiang-Dong Zhou
ViT
246
12
0
04 Feb 2023
Learning Large-scale Neural Fields via Context Pruned Meta-Learning
Learning Large-scale Neural Fields via Context Pruned Meta-LearningNeural Information Processing Systems (NeurIPS), 2023
Jihoon Tack
Subin Kim
Sihyun Yu
Jaeho Lee
Jinwoo Shin
Jonathan Richard Schwarz
311
14
0
01 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text
  Retrieval
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text RetrievalAAAI Conference on Artificial Intelligence (AAAI), 2023
Yizhen Chen
Jie Wang
Lijian Lin
Chen Ma
Jin Ma
Ying Shan
VLM
257
34
0
30 Jan 2023
Semi-Parametric Video-Grounded Text Generation
Semi-Parametric Video-Grounded Text Generation
Sungdong Kim
Jin-Hwa Kim
Jiyoung Lee
Minjoon Seo
VGen
250
17
0
27 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge
  Transferring
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringComputer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TSCLIPVLM
262
75
0
26 Jan 2023
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using
  a New Frame Selection Policy and Gating Mechanism
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating MechanismIEEE International Symposium on Multimedia (ISM), 2022
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
155
5
0
18 Jan 2023
Previous
12345...121314
Next
Page 4 of 14
Pageof 14