v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 675 papers shown

BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-IdentificationComputer Vision and Pattern Recognition (CVPR), 2021

167

105

30 Apr 2021

Three-stream network for enriched Action Recognition

Ivaxi Sheth

125

27 Apr 2021

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextNeural Information Processing Systems (NeurIPS), 2021

731

679

22 Apr 2021

T2VLAD: Global-Local Sequence Alignment for Text-Video RetrievalComputer Vision and Pattern Recognition (CVPR), 2021

Xiaohan Wang

Linchao Zhu

Yi Yang

376

213

20 Apr 2021

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition

Zuxuan Wu

303

20 Apr 2021

Temporal Query Networks for Fine-grained Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2021

Chuhan Zhang

Ankush Gupta

Andrew Zisserman

260

19 Apr 2021

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Tianrui Li

1.5K

1,001

18 Apr 2021

Adaptive Intermediate Representations for Video Understanding

157

14 Apr 2021

Video Question Answering with Phrases via Semantic RolesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Arka Sadhu

Kan Chen

Ram Nevatia

177

08 Apr 2021

Progressive Temporal Feature Alignment Network for Video InpaintingComputer Vision and Pattern Recognition (CVPR), 2021

Xueyan Zou

Linjie Yang

Ding Liu

Yong Jae Lee

166

08 Apr 2021

ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal Action Localization

Guang Chen

143

07 Apr 2021

CCSNet: a deep learning modeling suite for CO

_2

241

05 Apr 2021

Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalIEEE International Conference on Computer Vision (ICCV), 2021

838

1,452

01 Apr 2021

Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality ModelingIEEE International Symposium on High-Performance Parallel Distributed Computing (HPDC), 2021

Dingwen Tao

145

01 Apr 2021

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity PerspectiveIEEE International Conference on Computer Vision (ICCV), 2021

Jiarui Xu

Xiaolong Wang

VOS

385

108

31 Mar 2021

Broaden Your Views for Self-Supervised Video LearningIEEE International Conference on Computer Vision (ICCV), 2021

Adrià Recasens

Pauline Luc

Jean-Baptiste Alayrac

...

297

138

30 Mar 2021

Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation

Hao Li

223

30 Mar 2021

ViViT: A Video Vision TransformerIEEE International Conference on Computer Vision (ICCV), 2021

553

2,708

29 Mar 2021

Busy-Quiet Video Disentangling for Video ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021

Guoxi Huang

A. Bors

275

29 Mar 2021

No frame left behind: Full Video Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2021

X. Liu

S. Pintea

Fatemeh Karimi Nejadasl

Olaf Booij

Jan van Gemert

251

29 Mar 2021

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text RetrievalIEEE International Conference on Computer Vision (ICCV), 2021

339

165

28 Mar 2021

Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical Image Segmentation Using Deep Neural Networks: Past, Present, & Future

Teofilo E. Zosa

OOD

203

27 Mar 2021

A Comprehensive Review of the Video-to-Text ProblemArtificial Intelligence Review (AIR), 2021

269

27 Mar 2021

Learning Comprehensive Motion Representation for Action RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2021

Mingyu Wu

Boyuan Jiang

Donghao Luo

Junchi Yan

Yabiao Wang

Ying Tai

Chengjie Wang

Jilin Li

Feiyue Huang

Xiaokang Yang

109

23 Mar 2021

AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2021

162

22 Mar 2021

Efficient Spatialtemporal Context Modeling for Action RecognitionNeurocomputing (Neurocomputing), 2021

245

20 Mar 2021

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

Maksim Dzabraev

M. Kalashnikov

Stepan Alekseevich Komkov

Aleksandr Petiushko

221

148

19 Mar 2021

NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition

Xiaojun Chang

162

17 Mar 2021

Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and VisionInternational Journal of Computer Vision (IJCV), 2021

Andrew Shin

Masato Ishii

T. Narihira

289

06 Mar 2021

Unsupervised Motion Representation Enhanced Network for Action RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Xiaohang Yang

Lingtong Kong

Jie Yang

149

05 Mar 2021

VA-RED

^2

: Video Adaptive Redundancy ReductionInternational Conference on Learning Representations (ICLR), 2021

287

15 Feb 2021

Less is More: ClipBERT for Video-and-Language Learning via Sparse SamplingComputer Vision and Pattern Recognition (CVPR), 2021

458

748

11 Feb 2021

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action RecognitionInternational Conference on Learning Representations (ICLR), 2021

292

10 Feb 2021

Is Space-Time Attention All You Need for Video Understanding?International Conference on Machine Learning (ICML), 2021

Gedas Bertasius

Heng Wang

Lorenzo Torresani

ViT

1.1K

2,648

09 Feb 2021

Bridging the gap between Human Action Recognition and Online Action Detection

Alban Main De Boissiere

R. Noumeir

189

21 Jan 2021

Few-shot Action Recognition with Prototype-centered Attentive LearningBritish Machine Vision Conference (BMVC), 2021

Li Zhang

215

20 Jan 2021

TCLR: Temporal Contrastive Learning for Video RepresentationComputer Vision and Image Understanding (CVIU), 2021

Mubarak Shah

381

207

20 Jan 2021

3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral Image ClassificationIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2021

173

12 Jan 2021

Learning from Weakly-labeled Web Videos via Exploring Sub-ConceptsAAAI Conference on Artificial Intelligence (AAAI), 2021

Chen-Yu Lee

Tomas Pfister

138

11 Jan 2021

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video RecognitionComputer Vision and Pattern Recognition (CVPR), 2020

Hengduo Li

Zuxuan Wu

Abhinav Shrivastava

L. Davis

279

29 Dec 2020

Global Context NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

230

128

24 Dec 2020

Human Action Recognition from Various Data Modalities: A ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

Zehua Sun

Jun Liu

584

707

22 Dec 2020

TDN: Temporal Difference Networks for Efficient Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020

Limin Wang

Zhan Tong

Bin Ji

Gangshan Wu

450

462

18 Dec 2020

Multi-shot Temporal Event Localization: a BenchmarkComputer Vision and Pattern Recognition (CVPR), 2020

Yao Hu

204

17 Dec 2020

FLAVR: Flow-Agnostic Video Representations for Fast Frame InterpolationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020

399

174

15 Dec 2020

GTA: Global Temporal Attention for Video Action UnderstandingBritish Machine Vision Conference (BMVC), 2020

Bo He

Xitong Yang

Zuxuan Wu

Hao Chen

Ser-Nam Lim

Abhinav Shrivastava

ViT

183

15 Dec 2020

NUTA: Non-uniform Temporal Aggregation for Action RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020

Hao Chen

120

15 Dec 2020

A Comprehensive Study of Deep Video Action Recognition

Yi Zhu

Xinyu Li

Chunhui Liu

Mohammadreza Zolfaghari

283

210

11 Dec 2020

ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency PredictionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020

Subramanian Ramanathan

Vineet Gandhi

ViT

309

11 Dec 2020

Look Before you Speak: Visually Contextualized UtterancesComputer Vision and Pattern Recognition (CVPR), 2020

Paul Hongsuck Seo

Arsha Nagrani

Cordelia Schmid

312

10 Dec 2020