Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.10465
Cited By
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
20 April 2023
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Implicit Temporal Modeling with Learnable Alignment for Video Recognition"
11 / 11 papers shown
Title
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Congqi Cao
Peiheng Han
Y. Zhang
Yating Yu
Qinyi Lv
Lingtong Min
Yanning Zhang
VLM
28
0
0
09 May 2025
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
Ju He
Zhi-Qi Cheng
Chenyang Li
Wangmeng Xiang
Binghui Chen
Bin Luo
Yifeng Geng
Xuansong Xie
AI4CE
14
19
0
30 Mar 2023
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
Hanyuan Chen
Ju He
Wangmeng Xiang
Zhi-Qi Cheng
W. Liu
Han-Wen Liu
Bin Luo
Yifeng Geng
Xuansong Xie
ViT
18
30
0
03 Feb 2023
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
161
428
0
04 Dec 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
178
281
0
06 Nov 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
360
0
17 Sep 2021
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
Kelvin C. K. Chan
Shangchen Zhou
Xiangyu Xu
Chen Change Loy
149
388
0
27 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
1