Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.08908
Cited By
What Can Simple Arithmetic Operations Do for Temporal Modeling?
18 July 2023
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Can Simple Arithmetic Operations Do for Temporal Modeling?"
20 / 20 papers shown
Title
Sample-level Adaptive Knowledge Distillation for Action Recognition
Ping Li
Chenhao Ping
Wenxiao Wang
Mingli Song
49
0
0
01 Apr 2025
Storyboard guided Alignment for Fine-grained Video Action Recognition
Enqi Liu
Liyuan Pan
Yan Yang
Yiran Zhong
Zhijing Wu
Xinxiao Wu
Liu Liu
18
0
0
18 Oct 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
19
1
0
13 Aug 2024
MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition
Wenqing Gan
Yaoyu Li
Jian Li
Zhangang Lin
ViT
22
0
0
01 Aug 2024
Region-Based Representations Revisited
Michal Shlapentokh-Rothman
Ansel Blume
Yao Xiao
Yuqun Wu
TV Sethuraman
Heyi Tao
Jae Yong Lee
Wilfredo Torres
Yu-xiong Wang
Derek Hoiem
24
5
0
04 Feb 2024
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition
Chengyou Jia
Minnan Luo
Xiaojun Chang
Zhuohang Dang
Mingfei Han
Mengmeng Wang
Guangwen Dai
Sizhe Dang
Jingdong Wang
VLM
13
4
0
04 Dec 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
79
9
0
27 Nov 2023
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Wenhao Wu
Haipeng Luo
Bo Fang
Jingdong Wang
Wanli Ouyang
88
80
0
31 Dec 2022
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
91
47
0
31 Dec 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
141
261
0
17 Sep 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
154
676
0
22 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
298
771
0
18 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
276
1,490
0
27 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
272
1,939
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
188
375
0
01 Feb 2021
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Wenhao Wu
Dongliang He
Tianwei Lin
Fu Li
Chuang Gan
Errui Ding
79
68
0
13 Dec 2020
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Chenxu Luo
Alan Yuille
102
149
0
28 Sep 2019
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
Zuxuan Wu
Caiming Xiong
Chih-Yao Ma
R. Socher
L. Davis
110
194
0
29 Nov 2018
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
111
495
0
24 Apr 2018
1