ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 675 papers shown
SnapCap: Efficient Snapshot Compressive Video Captioning
SnapCap: Efficient Snapshot Compressive Video Captioning
Jianqiao Sun
Yudi Su
Hao Zhang
Ziheng Cheng
Zequn Zeng
Zhengjue Wang
Bo Chen
Xin Yuan
404
2
0
10 Jan 2024
Multi-Stage Contrastive Regression for Action Quality Assessment
Multi-Stage Contrastive Regression for Action Quality Assessment
Qi An
Mengshi Qi
Huadong Ma
202
8
0
05 Jan 2024
Glance and Focus: Memory Prompting for Multi-Event Video Question
  Answering
Glance and Focus: Memory Prompting for Multi-Event Video Question AnsweringNeural Information Processing Systems (NeurIPS), 2024
Ziyi Bai
Ruiping Wang
Xilin Chen
353
13
0
03 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
760
174
0
29 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TSVGen
271
12
0
21 Dec 2023
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model
  for Audio-Visual Speech Recognition
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Fan Yu
Haoxu Wang
Ziyang Ma
Shiliang Zhang
260
1
0
14 Dec 2023
Generative Model-based Feature Knowledge Distillation for Action
  Recognition
Generative Model-based Feature Knowledge Distillation for Action RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2023
Guiqin Wang
Peng Zhao
Yanjiang Shi
Cong Zhao
Shusen Yang
VLM
245
6
0
14 Dec 2023
ConFormer: A Novel Collection of Deep Learning Models to Assist
  Cardiologists in the Assessment of Cardiac Function
ConFormer: A Novel Collection of Deep Learning Models to Assist Cardiologists in the Assessment of Cardiac Function
Ethan Thomas
Salman Aslam
MedIm
243
1
0
13 Dec 2023
Combined Scheduling, Memory Allocation and Tensor Replacement for
  Minimizing Off-Chip Data Accesses of DNN Accelerators
Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators
Yi Li
Aarti Gupta
Sharad Malik
148
1
0
30 Nov 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
510
2
0
30 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
111
1
0
29 Nov 2023
F4D: Factorized 4D Convolutional Neural Network for Efficient
  Video-level Representation Learning
F4D: Factorized 4D Convolutional Neural Network for Efficient Video-level Representation LearningInternational Conference on Agents and Artificial Intelligence (ICAART), 2023
Mohammad Al-Saad
Lakshmish Ramaswamy
S. Bhandarkar
AI4TS
160
3
0
28 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
310
16
0
27 Nov 2023
MoVideo: Motion-Aware Video Generation with Diffusion Models
MoVideo: Motion-Aware Video Generation with Diffusion Models
Christos Sakaridis
Yuchen Fan
Kai Zhang
Radu Timofte
Luc Van Gool
Rakesh Ranjan
DiffMVGen
207
14
0
19 Nov 2023
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in
  Video-Language Models
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language ModelsInternational Conference on Learning Representations (ICLR), 2023
.Ilker Kesen
Andrea Pedrotti
Mustafa Dogan
Michele Cafagna
Emre Can Acikgoz
...
Iacer Calixto
Anette Frank
Albert Gatt
Aykut Erdem
Erkut Erdem
276
21
0
13 Nov 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIPVLMVGen
353
3
0
30 Oct 2023
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
RoboCLIP: One Demonstration is Enough to Learn Robot PoliciesNeural Information Processing Systems (NeurIPS), 2023
Sumedh Anand Sontakke
Jesse Zhang
Sébastien M. R. Arnold
Karl Pertsch
Erdem Biyik
Dorsa Sadigh
Chelsea Finn
Laurent Itti
OffRL
244
115
0
11 Oct 2023
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain
  Everyday Tasks
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday TasksAAAI Conference on Artificial Intelligence (AAAI), 2023
Jingyuan Qi
Minqian Liu
Ying Shen
Zhiyang Xu
Lifu Huang
LRMVGen
315
3
0
08 Oct 2023
Multiple Physics Pretraining for Physical Surrogate Models
Multiple Physics Pretraining for Physical Surrogate Models
Michael McCabe
Bruno Régaldo-Saint Blancard
Liam Parker
Ruben Ohana
M. Cranmer
...
Francois Lanusse
Mariel Pettee
Tiberiu Teşileanu
Kyunghyun Cho
Shirley Ho
PINNAI4CE
293
83
0
04 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to
  Video
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to VideoEuropean Conference on Computer Vision (ECCV), 2023
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
324
18
0
02 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
279
23
0
28 Sep 2023
Selective Volume Mixup for Video Action Recognition
Selective Volume Mixup for Video Action Recognition
Yi Tan
Zhaofan Qiu
Y. Hao
Ting Yao
Xiangnan He
Tao Mei
ViT
216
4
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
232
6
0
16 Sep 2023
UniST: Towards Unifying Saliency Transformer for Video Saliency
  Prediction and Detection
UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection
Jun Xiong
Peng Zhang
Chuanyue Li
Wei Huang
Yufei Zha
Tao You
ViT
159
3
0
15 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Multimodal Fish Feeding Intensity Assessment in AquacultureIEEE Transactions on Automation Science and Engineering (IEEE TASE), 2023
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
286
22
0
10 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction UnderstandingIEEE International Conference on Computer Vision (ICCV), 2023
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
183
12
0
05 Sep 2023
Multimodal Contrastive Learning with Hard Negative Sampling for Human
  Activity Recognition
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
Irfan Essa
SSL
231
7
0
03 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language
  Recognition
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
281
9
0
02 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
307
33
0
27 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring
  Multi-task Learning
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning
P. Balaji
Abhijit Das
Srijan Das
A. Dantcheva
CVBM
142
5
0
25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
Motion-Guided Masking for Spatiotemporal Representation LearningIEEE International Conference on Computer Vision (ICCV), 2023
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
213
28
0
24 Aug 2023
Masked Feature Modelling: Feature Masking for the Unsupervised
  Pre-training of a Graph Attention Network Block for Bottom-up Video Event
  Recognition
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
197
0
0
24 Aug 2023
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for
  Non-Photorealistic Videos
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic VideosACM Multimedia (ACM MM), 2023
Ziyuan Yang
Sucheng Ren
Zongwei Wu
Nanxuan Zhao
Junle Wang
Jing Qin
Shengfeng He
212
3
0
23 Aug 2023
Opening the Vocabulary of Egocentric Actions
Opening the Vocabulary of Egocentric ActionsNeural Information Processing Systems (NeurIPS), 2023
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
315
23
0
22 Aug 2023
Temporal-Distributed Backdoor Attack Against Video Based Action
  Recognition
Temporal-Distributed Backdoor Attack Against Video Based Action RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2023
Xi Li
Songhe Wang
Rui Huang
Mahanth K. Gowda
G. Kesidis
AAML
426
7
0
21 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Improving Continuous Sign Language Recognition with Cross-Lingual SignsIEEE International Conference on Computer Vision (ICCV), 2023
Fangyun Wei
Yutong Chen
SLR
212
40
0
21 Aug 2023
Joint learning of images and videos with a single Vision Transformer
Joint learning of images and videos with a single Vision Transformer
Shuki Shimizu
Toru Tamaki
ViT
182
0
0
21 Aug 2023
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion
  Prompts Learning
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts LearningACM Multimedia (ACM MM), 2023
Qianqian Wang
Junlong Du
Ke Yan
Shouhong Ding
VLM
179
31
0
09 Aug 2023
Capturing Co-existing Distortions in User-Generated Content for
  No-reference Video Quality Assessment
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality AssessmentACM Multimedia (ACM MM), 2023
Kun Yuan
Zishang Kong
Chuanchuan Zheng
Ming-Ting Sun
Xingsen Wen
ViT
250
19
0
31 Jul 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature
  Restoration
Sample Less, Learn More: Efficient Action Recognition via Frame Feature RestorationACM Multimedia (ACM MM), 2023
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
223
9
0
27 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?IEEE International Conference on Computer Vision (ICCV), 2023
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
224
17
0
18 Jul 2023
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
  Pre-training
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-trainingIEEE International Conference on Computer Vision (ICCV), 2023
Hongfei Yan
Zehua Wang
Yushen Wei
Zerui Li
Guanbin Li
Guanbin Li
283
66
0
17 Jul 2023
TALL: Thumbnail Layout for Deepfake Video Detection
TALL: Thumbnail Layout for Deepfake Video DetectionIEEE International Conference on Computer Vision (ICCV), 2023
Yuting Xu
Jian Liang
Gengyun Jia
Ziming Yang
Yanhao Zhang
Ran He
ViT
322
108
0
14 Jul 2023
TVPR: Text-to-Video Person Retrieval and a New Benchmark
TVPR: Text-to-Video Person Retrieval and a New BenchmarkACM Multimedia (ACM MM), 2023
Fan Ni
Xu Zhang
Jianhui Wu
Guan-Nan Dong
Aichun Zhu
Hui Liu
Yue Zhang
312
2
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
260
27
0
13 Jul 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the
  Backbone
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the BackboneIEEE International Conference on Computer Vision (ICCV), 2023
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
351
134
0
11 Jul 2023
Self-Adaptive Sampling for Efficient Video Question-Answering on
  Image--Text Models
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models
Wei Han
Hui Chen
MingSung Kan
Soujanya Poria
495
3
0
09 Jul 2023
Vision-Language Models can Identify Distracted Driver Behavior from
  Naturalistic Videos
Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Md Zahid Hasan
Jiajing Chen
Jiyang Wang
Mohammed Shaiqur Rahman
Ameya Joshi
Senem Velipasalar
Chinmay Hegde
Anuj Sharma
Soumik Sarkar
VLM
365
41
0
16 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in
  Vision Transformers
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Vasu Sharma
Srijan Das
ViT
261
4
0
15 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
Learning to Ground Instructional Articles in Videos through NarrationsIEEE International Conference on Computer Vision (ICCV), 2023
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
230
27
0
06 Jun 2023
Previous
123456...121314
Next
Page 3 of 14
Pageof 14