Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 675 papers shown
SnapCap: Efficient Snapshot Compressive Video Captioning
Jianqiao Sun
Yudi Su
Hao Zhang
Ziheng Cheng
Zequn Zeng
Zhengjue Wang
Bo Chen
Xin Yuan
404
2
0
10 Jan 2024
Multi-Stage Contrastive Regression for Action Quality Assessment
Qi An
Mengshi Qi
Huadong Ma
202
8
0
05 Jan 2024
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
Neural Information Processing Systems (NeurIPS), 2024
Ziyi Bai
Ruiping Wang
Xilin Chen
353
13
0
03 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
760
174
0
29 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
271
12
0
21 Dec 2023
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Fan Yu
Haoxu Wang
Ziyang Ma
Shiliang Zhang
260
1
0
14 Dec 2023
Generative Model-based Feature Knowledge Distillation for Action Recognition
AAAI Conference on Artificial Intelligence (AAAI), 2023
Guiqin Wang
Peng Zhao
Yanjiang Shi
Cong Zhao
Shusen Yang
VLM
245
6
0
14 Dec 2023
ConFormer: A Novel Collection of Deep Learning Models to Assist Cardiologists in the Assessment of Cardiac Function
Ethan Thomas
Salman Aslam
MedIm
243
1
0
13 Dec 2023
Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators
Yi Li
Aarti Gupta
Sharad Malik
148
1
0
30 Nov 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
510
2
0
30 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
111
1
0
29 Nov 2023
F4D: Factorized 4D Convolutional Neural Network for Efficient Video-level Representation Learning
International Conference on Agents and Artificial Intelligence (ICAART), 2023
Mohammad Al-Saad
Lakshmish Ramaswamy
S. Bhandarkar
AI4TS
160
3
0
28 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
310
16
0
27 Nov 2023
MoVideo: Motion-Aware Video Generation with Diffusion Models
Christos Sakaridis
Yuchen Fan
Kai Zhang
Radu Timofte
Luc Van Gool
Rakesh Ranjan
DiffM
VGen
207
14
0
19 Nov 2023
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
International Conference on Learning Representations (ICLR), 2023
.Ilker Kesen
Andrea Pedrotti
Mustafa Dogan
Michele Cafagna
Emre Can Acikgoz
...
Iacer Calixto
Anette Frank
Albert Gatt
Aykut Erdem
Erkut Erdem
276
21
0
13 Nov 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
353
3
0
30 Oct 2023
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
Neural Information Processing Systems (NeurIPS), 2023
Sumedh Anand Sontakke
Jesse Zhang
Sébastien M. R. Arnold
Karl Pertsch
Erdem Biyik
Dorsa Sadigh
Chelsea Finn
Laurent Itti
OffRL
244
115
0
11 Oct 2023
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
AAAI Conference on Artificial Intelligence (AAAI), 2023
Jingyuan Qi
Minqian Liu
Ying Shen
Zhiyang Xu
Lifu Huang
LRM
VGen
315
3
0
08 Oct 2023
Multiple Physics Pretraining for Physical Surrogate Models
Michael McCabe
Bruno Régaldo-Saint Blancard
Liam Parker
Ruben Ohana
M. Cranmer
...
Francois Lanusse
Mariel Pettee
Tiberiu Teşileanu
Kyunghyun Cho
Shirley Ho
PINN
AI4CE
293
83
0
04 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
European Conference on Computer Vision (ECCV), 2023
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
324
18
0
02 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
279
23
0
28 Sep 2023
Selective Volume Mixup for Video Action Recognition
Yi Tan
Zhaofan Qiu
Y. Hao
Ting Yao
Xiangnan He
Tao Mei
ViT
216
4
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
232
6
0
16 Sep 2023
UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection
Jun Xiong
Peng Zhang
Chuanyue Li
Wei Huang
Yufei Zha
Tao You
ViT
159
3
0
15 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
IEEE Transactions on Automation Science and Engineering (IEEE TASE), 2023
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
286
22
0
10 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
IEEE International Conference on Computer Vision (ICCV), 2023
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
183
12
0
05 Sep 2023
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
Irfan Essa
SSL
231
7
0
03 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
281
9
0
02 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
307
33
0
27 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning
P. Balaji
Abhijit Das
Srijan Das
A. Dantcheva
CVBM
142
5
0
25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
IEEE International Conference on Computer Vision (ICCV), 2023
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
213
28
0
24 Aug 2023
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
197
0
0
24 Aug 2023
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
ACM Multimedia (ACM MM), 2023
Ziyuan Yang
Sucheng Ren
Zongwei Wu
Nanxuan Zhao
Junle Wang
Jing Qin
Shengfeng He
212
3
0
23 Aug 2023
Opening the Vocabulary of Egocentric Actions
Neural Information Processing Systems (NeurIPS), 2023
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
315
23
0
22 Aug 2023
Temporal-Distributed Backdoor Attack Against Video Based Action Recognition
AAAI Conference on Artificial Intelligence (AAAI), 2023
Xi Li
Songhe Wang
Rui Huang
Mahanth K. Gowda
G. Kesidis
AAML
426
7
0
21 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
IEEE International Conference on Computer Vision (ICCV), 2023
Fangyun Wei
Yutong Chen
SLR
212
40
0
21 Aug 2023
Joint learning of images and videos with a single Vision Transformer
Shuki Shimizu
Toru Tamaki
ViT
182
0
0
21 Aug 2023
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
ACM Multimedia (ACM MM), 2023
Qianqian Wang
Junlong Du
Ke Yan
Shouhong Ding
VLM
179
31
0
09 Aug 2023
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment
ACM Multimedia (ACM MM), 2023
Kun Yuan
Zishang Kong
Chuanchuan Zheng
Ming-Ting Sun
Xingsen Wen
ViT
250
19
0
31 Jul 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
ACM Multimedia (ACM MM), 2023
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
223
9
0
27 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
IEEE International Conference on Computer Vision (ICCV), 2023
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
224
17
0
18 Jul 2023
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
IEEE International Conference on Computer Vision (ICCV), 2023
Hongfei Yan
Zehua Wang
Yushen Wei
Zerui Li
Guanbin Li
Guanbin Li
283
66
0
17 Jul 2023
TALL: Thumbnail Layout for Deepfake Video Detection
IEEE International Conference on Computer Vision (ICCV), 2023
Yuting Xu
Jian Liang
Gengyun Jia
Ziming Yang
Yanhao Zhang
Ran He
ViT
322
108
0
14 Jul 2023
TVPR: Text-to-Video Person Retrieval and a New Benchmark
ACM Multimedia (ACM MM), 2023
Fan Ni
Xu Zhang
Jianhui Wu
Guan-Nan Dong
Aichun Zhu
Hui Liu
Yue Zhang
312
2
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
IEEE International Conference on Computer Vision (ICCV), 2023
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
260
27
0
13 Jul 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
IEEE International Conference on Computer Vision (ICCV), 2023
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
351
134
0
11 Jul 2023
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models
Wei Han
Hui Chen
MingSung Kan
Soujanya Poria
495
3
0
09 Jul 2023
Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Md Zahid Hasan
Jiajing Chen
Jiyang Wang
Mohammed Shaiqur Rahman
Ameya Joshi
Senem Velipasalar
Chinmay Hegde
Anuj Sharma
Soumik Sarkar
VLM
365
41
0
16 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Vasu Sharma
Srijan Das
ViT
261
4
0
15 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
IEEE International Conference on Computer Vision (ICCV), 2023
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
230
27
0
06 Jun 2023
Previous
1
2
3
4
5
6
...
12
13
14
Next
Page 3 of 14
Page
of 14
Go