Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 675 papers shown
Title
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
167
2
0
17 Jan 2023
TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders using Hierarchical Maps Distillation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Feiyan Hu
S. Palazzo
Federica Proietto Salanitri
Giovanni Bellitto
Morteza Moradi
C. Spampinato
Kevin McGuinness
135
16
0
11 Jan 2023
Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation
International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2022
Hilmil Pradana
Minh-Son Dao
K. Zettsu
137
6
0
06 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Computer Vision and Pattern Recognition (CVPR), 2023
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
394
69
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
339
4
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Computer Vision and Pattern Recognition (CVPR), 2023
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
436
47
0
05 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Guohao Li
AAML
262
12
0
03 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2022
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
333
77
0
31 Dec 2022
An end-to-end multi-scale network for action prediction in videos
Xiaofan Liu
Jianqin Yin
Yuanxi Sun
Zhicheng Zhang
Jin Tang
149
1
0
31 Dec 2022
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition
Xi Shen
Zhedong Zheng
Yi Yang
SLR
303
24
0
25 Dec 2022
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
Machine Vision and Applications (MVA), 2022
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
262
6
0
21 Dec 2022
MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning
Yuan Liu
Jiacheng Chen
Hao Wu
217
3
0
21 Dec 2022
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
IEEE International Conference on Computer Vision (ICCV), 2022
Sangwon Kim
Dasom Ahn
ByoungChul Ko
ViT
3DPC
293
40
0
12 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
Computer Vision and Pattern Recognition (CVPR), 2022
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
248
91
0
09 Dec 2022
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
284
69
0
09 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
IEEE Access (IEEE Access), 2022
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
174
3
0
09 Dec 2022
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition
Neural Networks (NN), 2022
Santosh Kumar Yadav
Achleshwar Luthra
Esha Pahwa
K. Tiwari
Heena Rathore
Hari Mohan Pandey
Peter Corcoran
206
19
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Computer Vision and Pattern Recognition (CVPR), 2022
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
224
68
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
International Journal of Computer Vision (IJCV), 2022
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
227
2
0
03 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
177
13
0
02 Dec 2022
Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition
Rohit Gupta
Naveed Akhtar
Gaurav Kumar Nayak
Lin Wang
M. Shah
AAML
179
1
0
23 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
163
1
0
23 Nov 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
162
10
0
21 Nov 2022
Generalizable Deepfake Detection with Phase-Based Motion Analysis
IEEE Transactions on Image Processing (IEEE TIP), 2022
Ekta Prashnani
Michael Goebel
B. S. Manjunath
188
15
0
17 Nov 2022
Dynamic Temporal Filtering in Video Models
European Conference on Computer Vision (ECCV), 2022
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
219
24
0
15 Nov 2022
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
H. Haresamudram
Irfan Essa
121
9
0
08 Nov 2022
Two-Stream Network for Sign Language Recognition and Translation
Neural Information Processing Systems (NeurIPS), 2022
Yutong Chen
Ronglai Zuo
Fangyun Wei
Yu-Huan Wu
Shujie Liu
Brian Mak
SLR
206
190
0
02 Nov 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
British Machine Vision Conference (BMVC), 2022
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
138
31
0
13 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Neural Information Processing Systems (NeurIPS), 2022
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
255
84
0
12 Oct 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
Zixu Wang
Yujie Zhong
Yishu Miao
Lin Ma
Lucia Specia
215
15
0
10 Oct 2022
Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors
British Machine Vision Conference (BMVC), 2022
Federico Baldassarre
Quentin Debard
Gonzalo Fiz Pontiveros
Tri Kurniawan Wijaya
186
5
0
07 Oct 2022
Locate before Answering: Answer Guided Question Localization for Video Question Answering
IEEE transactions on multimedia (IEEE TMM), 2022
Tianwen Qian
Ran Cui
Yue Yu
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
311
25
0
05 Oct 2022
Alignment-guided Temporal Attention for Video Action Recognition
Neural Information Processing Systems (NeurIPS), 2022
Yizhou Zhao
Zhenyang Li
Xun Guo
Yan Lu
139
19
0
30 Sep 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
International Conference on Learning Representations (ICLR), 2022
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffM
VGen
283
1,766
0
29 Sep 2022
Rethinking Resolution in the Context of Efficient Video Recognition
Neural Information Processing Systems (NeurIPS), 2022
Chuofan Ma
Qiushan Guo
Yi Jiang
Zehuan Yuan
Ping Luo
Xiaojuan Qi
195
16
0
26 Sep 2022
LGDN: Language-Guided Denoising Network for Video-Language Modeling
Neural Information Processing Systems (NeurIPS), 2022
Haoyu Lu
Mingyu Ding
Nanyi Fei
Yuqi Huo
Zhiwu Lu
VLM
256
19
0
23 Sep 2022
Multi-level Adversarial Spatio-temporal Learning for Footstep Pressure based FoG Detection
IEEE journal of biomedical and health informatics (IEEE JBHI), 2022
Kun Hu
Shaohui Mei
Wei Wang
K. E. Martens
Liang Wang
S. Lewis
Dagan Feng
Zhiyong Wang
187
9
0
22 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Neural Information Processing Systems (NeurIPS), 2022
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
259
178
0
15 Sep 2022
Multiple View Performers for Shape Completion
David Watkins-Valls
Peter K. Allen
K. Choromanski
Jacob Varley
Nicholas R. Waytowich
103
1
0
13 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Computer Vision and Pattern Recognition (CVPR), 2022
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
586
82
0
04 Sep 2022
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
European Conference on Computer Vision (ECCV), 2022
Tianjiao Li
Lin Geng Foo
Qiuhong Ke
Hossein Rahmani
Anran Wang
Jinghua Wang
Jing Liu
190
28
0
03 Sep 2022
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
Zdravko Marinov
Alina Roitberg
David Schneider
Rainer Stiefelhagen
196
6
0
19 Aug 2022
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
Shuo Liu
Weize Quan
Mingyuan Zhou
Sihong Chen
Jian Kang
Zhenlan Zhao
Chen Chen
Dong-Ming Yan
126
3
0
16 Aug 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
European Conference on Computer Vision (ECCV), 2022
Medhini Narasimhan
Arsha Nagrani
Chen Sun
Michael Rubinstein
Trevor Darrell
Anna Rohrbach
Cordelia Schmid
200
41
0
14 Aug 2022
Motion Sensitive Contrastive Learning for Self-supervised Video Representation
European Conference on Computer Vision (ECCV), 2022
Jingcheng Ni
Nana Zhou
Jie Qin
Qianrun Wu
Junqi Liu
Boxun Li
Di Huang
SSL
180
20
0
12 Aug 2022
Class-attention Video Transformer for Engagement Intensity Prediction
Xusheng Ai
Victor S. Sheng
Chunhua Li
Zhiming Cui
ViT
129
12
0
12 Aug 2022
Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
Ying Fan
Longfei Han
Yue Zhang
Lechao Cheng
Chenzhen Xia
Di Hu
SSL
164
1
0
10 Aug 2022
Sports Video Analysis on Large-Scale Data
European Conference on Computer Vision (ECCV), 2022
Dekun Wu
Henghui Zhao
Xingce Bao
Richard P. Wildes
130
23
0
09 Aug 2022
Frozen CLIP Models are Efficient Video Learners
European Conference on Computer Vision (ECCV), 2022
Ziyi Lin
Shijie Geng
Renrui Zhang
Shiyang Feng
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Jiaming Song
CLIP
VLM
231
251
0
06 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
European Conference on Computer Vision (ECCV), 2022
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
263
425
0
04 Aug 2022
Previous
1
2
3
4
5
6
...
12
13
14
Next