Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 675 papers shown
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
Guo Chen
Yin-Dong Zheng
Limin Wang
Tong Lu
AI4TS
207
83
0
07 Dec 2021
E
2
^2
2
(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
Chiara Plizzari
M. Planamente
Gabriele Goletto
Marco Cannici
Emanuele Gusso
Matteo Matteucci
Barbara Caputo
EgoV
242
69
0
07 Dec 2021
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
Zhaoqilin Yang
Gaoyun An
202
6
0
05 Dec 2021
PreViTS: Contrastive Pretraining with Video Tracking Supervision
Brian Chen
Ramprasaath R. Selvaraju
Shih-Fu Chang
Juan Carlos Niebles
Nikhil Naik
ViT
241
3
0
01 Dec 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
IEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Jingjing Jiang
Zi-yi Liu
N. Zheng
318
18
0
29 Nov 2021
Video Frame Interpolation Transformer
Computer Vision and Pattern Recognition (CVPR), 2021
Zhihao Shi
Xiangyu Xu
Xiaohong Liu
Jun Chen
Ming-Hsuan Yang
ViT
303
205
0
27 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
337
299
0
25 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
402
239
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Computer Vision and Pattern Recognition (CVPR), 2021
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
241
249
0
19 Nov 2021
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval
Yue Yang
Joongwon Kim
Artemis Panagopoulou
Mark Yatskar
Chris Callison-Burch
LM&Ro
261
14
0
17 Nov 2021
A Survey of Visual Transformers
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Peng Wang
Jianping Fan
Zhiqiang He
3DGS
ViT
467
477
0
11 Nov 2021
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
135
5
0
05 Nov 2021
Revisiting spatio-temporal layouts for compositional action recognition
British Machine Vision Conference (BMVC), 2021
Gorjan Radevski
Marie-Francine Moens
Tinne Tuytelaars
209
29
0
02 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
292
31
0
01 Nov 2021
ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition
Masahiro Mitsuhara
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
208
1
0
29 Oct 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
303
28
0
27 Oct 2021
Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition
IEEE Access (IEEE Access), 2021
Hamed Valizadegan
D. Caldwell
SLR
138
59
0
24 Oct 2021
Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Neural Information Processing Systems (NeurIPS), 2021
Reuben Tan
Bryan A. Plummer
Kate Saenko
Hailin Jin
Bryan C. Russell
SSL
204
28
0
20 Oct 2021
Constrained Mean Shift for Representation Learning
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
SSL
149
0
0
19 Oct 2021
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context
Yuxi Li
Boshen Zhang
Jian Li
Yabiao Wang
Weiyao Lin
Chengjie Wang
Jilin Li
Feiyue Huang
149
5
0
19 Oct 2021
MAAD: A Model and Dataset for "Attended Awareness" in Driving
Deepak Gopinath
Guy Rosman
Simon Stent
K. Terahata
L. Fletcher
B. Argall
John J. Leonard
125
15
0
16 Oct 2021
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions
Chenyu Yi
Siyuan Yang
Haoliang Li
Yap-Peng Tan
Alex C. Kot
238
39
0
13 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding
International Conference on Learning Representations (ICLR), 2021
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Mingqian Tang
Ziwei Liu
M. Ang
415
68
0
12 Oct 2021
Early Melanoma Diagnosis with Sequential Dermoscopic Images
IEEE Transactions on Medical Imaging (IEEE TMI), 2021
Zhen Yu
Jennifer Nguyen
Toàn D. Nguyên
J. Kelly
C. Mclean
Paul Bonnington
Lei Zhang
Victoria Mar
Z. Ge
242
53
0
12 Oct 2021
Video Is Graph: Structured Graph Module for Video Action Recognition
Rongjie Li
Xiaojun Wu
Tianyang Xu
365
15
0
12 Oct 2021
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction
Rishubh Parihar
Gaurav Ramola
Ranajit Saha
Raviprasad Kini
Aniket Rege
S. Velusamy
142
1
0
03 Oct 2021
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
Jay Patravali
Gaurav Mittal
Ye Yu
Fuxin Li
Mei Chen
231
23
0
30 Sep 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
805
690
0
28 Sep 2021
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Ji Lin
Chuang Gan
Kuan-Chieh Wang
Song Han
168
80
0
27 Sep 2021
Joint Multimedia Event Extraction from Video and Article
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Brian Chen
Xudong Lin
Christopher Thomas
Pengfei Yu
Shoya Yoshida
Lovish Chum
Heng Ji
Shih-Fu Chang
VGen
153
31
0
27 Sep 2021
Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2021
Junjun He
Jin Ye
Cheng Li
Diping Song
Wanli Chen
Shanshan Wang
Lixu Gu
Yu Qiao
117
4
0
26 Sep 2021
Audio-Visual Speech Recognition is Worth 32
×
\times
×
32
×
\times
×
8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
182
7
0
20 Sep 2021
Towards High-Quality Temporal Action Detection with Sparse Proposals
Jiannan Wu
Pei Sun
Shoufa Chen
Jiewen Yang
Zihao Qi
Lan Ma
Ping Luo
ViT
148
11
0
18 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
395
463
0
17 Sep 2021
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Zhenzhi Wang
Liyu Wu
Zhimin Li
Jiangfeng Xiong
Qinglin Lu
144
5
0
16 Sep 2021
Deep Visual Navigation under Partial Observability
Bo Ai
Wei Gao
Vinay
David Hsu
233
15
0
16 Sep 2021
Multi-modal Representation Learning for Video Advertisement Content Structuring
Daya Guo
Zhaoyang Zeng
113
6
0
04 Sep 2021
Revisiting 3D ResNets for Video Recognition
Xianzhi Du
Yeqing Li
Huayu Chen
Rui Qian
Jing Li
Irwan Bello
252
20
0
03 Sep 2021
Hierarchical 3D Feature Learning for Pancreas Segmentation
Federica Proietto Salanitri
Giovanni Bellitto
Ismail Irmakci
S. Palazzo
Ulas Bagci
C. Spampinato
MedIm
73
13
0
03 Sep 2021
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion
ACM Transactions on Architecture and Code Optimization (TACO) (TACO), 2020
Wei Niu
Jiexiong Guan
Yanzhi Wang
G. Agrawal
Bin Ren
AI4CE
225
187
0
30 Aug 2021
Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions
Machine Intelligence Research (MIR), 2021
Yang Wu
Dingheng Wang
Xiaotong Lu
Fan Yang
Guoqi Li
Weiming Dong
Jianbo Shi
378
18
0
30 Aug 2021
Searching for Two-Stream Models in Multivariate Space for Video Recognition
IEEE International Conference on Computer Vision (ICCV), 2021
Xinyu Gong
Heng Wang
Zheng Shou
Matt Feiszli
Zinan Lin
Zhicheng Yan
190
9
0
30 Aug 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Neural Information Processing Systems (NeurIPS), 2021
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
AI4TS
ViT
299
30
0
26 Aug 2021
Identity-aware Graph Memory Network for Action Detection
ACM Multimedia (ACM MM), 2021
Jingcheng Ni
Jie Qin
Di Huang
183
10
0
26 Aug 2021
Spatio-Temporal Self-Attention Network for Video Saliency Prediction
IEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Ziqiang Wang
Zhi Liu
Gongyang Li
Yang Wang
Tianhong Zhang
Lihua Xu
Jijun Wang
3DPC
328
59
0
24 Aug 2021
ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning
IEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Zhiwu Qing
Ziyuan Huang
Shiwei Zhang
Mingqian Tang
Changxin Gao
M. Ang
Ronglei Ji
Nong Sang
336
3
0
24 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
IEEE International Conference on Computer Vision (ICCV), 2021
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
220
154
0
23 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
259
101
0
20 Aug 2021
Self-Supervised Video Representation Learning with Meta-Contrastive Network
Yuanze Lin
Xun Guo
Yan Lu
SSL
252
44
0
19 Aug 2021
Multi-Camera Trajectory Forecasting with Trajectory Tensors
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Olly Styles
T. Guha
Victor Sanchez
125
11
0
10 Aug 2021
Previous
1
2
3
...
7
8
9
...
12
13
14
Next