ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.03024
  4. Cited By
AIM: Adapting Image Models for Efficient Video Action Recognition

AIM: Adapting Image Models for Efficient Video Action Recognition

6 February 2023
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
    ViT
ArXivPDFHTML

Papers citing "AIM: Adapting Image Models for Efficient Video Action Recognition"

50 / 105 papers shown
Title
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object
  Understanding
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Yun-Hai Liu
Haolin Yang
Xu Si
Ling Liu
Zipeng Li
Yuxiang Zhang
Yebin Liu
Li Yi
52
22
0
16 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Distilling Vision-Language Models on Millions of Videos
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
19
13
0
11 Jan 2024
EZ-CLIP: Efficient Zeroshot Video Action Recognition
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Shahzad Ahmad
S. Chanda
Y. S. Rawat
VLM
22
7
0
13 Dec 2023
MinD-3D: Reconstruct High-quality 3D objects in Human Brain
MinD-3D: Reconstruct High-quality 3D objects in Human Brain
Jianxiong Gao
Yu Fu
Yun Wang
Xuelin Qian
Jianfeng Feng
Yanwei Fu
DiffM
12
6
0
12 Dec 2023
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial
  Expression Recognition in Videos
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos
Yin Chen
Jia Li
Shiguang Shan
Meng Wang
Richang Hong
44
32
0
09 Dec 2023
Adapting Vision Transformer for Efficient Change Detection
Adapting Vision Transformer for Efficient Change Detection
Yang Zhao
Yuxiang Zhang
Yanni Dong
Bo Du
VLM
16
2
0
08 Dec 2023
DreamVideo: Composing Your Dream Videos with Customized Subject and
  Motion
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei
Shiwei Zhang
Zhiwu Qing
Hangjie Yuan
Zhiheng Liu
Yu Liu
Yingya Zhang
Jingren Zhou
Hongming Shan
DiffM
VGen
11
89
0
07 Dec 2023
The Potential of Vision-Language Models for Content Moderation of
  Children's Videos
The Potential of Vision-Language Models for Content Moderation of Children's Videos
Syed Hammad Ahmed
Shengnan Hu
G. Sukthankar
VLM
11
2
0
06 Dec 2023
D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for
  Few-shot Action Recognition
D2^22ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei
Qizhong Tan
Guangming Lu
Jiandong Tian
39
3
0
03 Dec 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for
  General Video Recognition
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
15
7
0
30 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Bernard Ghanem
24
25
0
28 Nov 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient
  Image-to-Video Transfer Learning
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
87
9
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
23
6
0
27 Nov 2023
Towards Few-shot Out-of-Distribution Detection
Towards Few-shot Out-of-Distribution Detection
Jiuqing Dong
Yongbin Gao
Heng Zhou
Jun Cen
Yifan Yao
Sook Yoon
Park Dong Sun
OODD
11
3
0
20 Nov 2023
Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained
  Image Foundation Models
Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models
Hao Li
Han Liu
Dewei Hu
Jiacheng Wang
I. Oguz
MedIm
12
9
0
30 Oct 2023
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Kun-Yu Lin
Jia-Run Du
Yipeng Gao
Jiaming Zhou
Wei-Shi Zheng
29
7
0
27 Oct 2023
MACP: Efficient Model Adaptation for Cooperative Perception
MACP: Efficient Model Adaptation for Cooperative Perception
Yunsheng Ma
Juanwu Lu
Can Cui
Sicheng Zhao
Xu Cao
Wenqian Ye
Ziran Wang
19
11
0
25 Oct 2023
Videoprompter: an ensemble of foundational models for zero-shot video
  understanding
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
VLM
12
2
0
23 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by
  Language-based Semantic Alignment
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
11
200
0
03 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to
  Video
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
27
8
0
02 Oct 2023
PEFTT: Parameter-Efficient Fine-Tuning for low-resource Tibetan
  pre-trained language models
PEFTT: Parameter-Efficient Fine-Tuning for low-resource Tibetan pre-trained language models
Mingjun Zhou
Daiqing Zhuoma
Qun Nuo
T. Nyima
14
0
0
21 Sep 2023
3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images
Yifu Zhang
Zuo-Qiang Liu
Yang Feng
Renjing Xu
17
2
0
20 Sep 2023
Efficient Pyramid Channel Attention Network for Pathological Myopia
  Recognition
Efficient Pyramid Channel Attention Network for Pathological Myopia Recognition
Xiaoqing Zhang
Jilu Zhao
Yan Li
Hao Wu
Xiangtian Zhou
Jiang Liu
10
1
0
17 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation
  Model for Image Change Understanding
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
9
1
0
15 Sep 2023
MS-UNet-v2: Adaptive Denoising Method and Training Strategy for Medical
  Image Segmentation with Small Training Data
MS-UNet-v2: Adaptive Denoising Method and Training Strategy for Medical Image Segmentation with Small Training Data
Haoyuan Chen
Yufei Han
Pin Xu
Yanyi Li
Kuan Li
Jianping Yin
16
0
0
07 Sep 2023
RGB-T Tracking via Multi-Modal Mutual Prompt Learning
RGB-T Tracking via Multi-Modal Mutual Prompt Learning
Yang Luo
Xiqing Guo
Hui Feng
Lei Ao
16
9
0
31 Aug 2023
CEFHRI: A Communication Efficient Federated Learning Framework for
  Recognizing Industrial Human-Robot Interaction
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction
Umar Khalid
Hasan Iqbal
Saeed Vahidian
Jing Hua
C. L. P. Chen
19
3
0
29 Aug 2023
SimDA: Simple Diffusion Adapter for Efficient Video Generation
SimDA: Simple Diffusion Adapter for Efficient Video Generation
Zhen Xing
Qi Dai
Hang-Rui Hu
Zuxuan Wu
Yu-Gang Jiang
VGen
DiffM
14
81
0
18 Aug 2023
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Jiazheng Xing
Mengmeng Wang
Xiaojun Hou
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
VLM
11
0
0
03 Aug 2023
AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation
  Datasets
AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets
Siyi Du
Nourhan Bayasi
Ghassan Hamarneh
Rafeef Garbi
ViT
13
2
0
26 Jul 2023
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Zuozhuo Dai
Fang Shao
Qingkun Su
Zilong Dong
Siyu Zhu
159
1
0
14 Jul 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Yin Cui
...
Florian Schroff
Hartwig Adam
Ming Yang
Ting Liu
Boqing Gong
ELM
22
9
0
06 Jul 2023
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in
  Situation Recognition
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition
Debaditya Roy
Dhruv Verma
Basura Fernando
VLM
CLIP
10
4
0
02 Jul 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Bernard Ghanem
Dacheng Tao
ObjD
VLM
25
134
0
28 Jun 2023
3DSAM-adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable
  Medical Image Segmentation
3DSAM-adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation
Shizhan Gong
Yuan Zhong
Wenao Ma
Jinpeng Li
Zhao Wang
Jingyang Zhang
Pheng-Ann Heng
Qi Dou
MedIm
15
72
0
23 Jun 2023
Enhance-NeRF: Multiple Performance Evaluation for Neural Radiance Fields
Enhance-NeRF: Multiple Performance Evaluation for Neural Radiance Fields
Qianqiu Tan
Tao Liu
Yinling Xie
Shuwan Yu
Baohua Zhang
8
0
0
08 Jun 2023
Segment Anything in High Quality
Segment Anything in High Quality
Lei Ke
Mingqiao Ye
Martin Danelljan
Yifan Liu
Yu-Wing Tai
Chi-Keung Tang
F. I. F. Richard Yu
VLM
8
303
0
02 Jun 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
  Scale
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
18
9
0
23 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
13
113
0
18 May 2023
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models
  with Enhanced Adapter
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
Zheng Yuan
HU Xue
Kun Wang
Yongming Liu
Kun Wang
VLM
MLLM
8
5
0
12 May 2023
Visual Tuning
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
39
37
0
10 May 2023
Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval
Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval
Shiyin Dong
Mingrui Zhu
N. Wang
Xinbo Gao
VLM
16
3
0
09 May 2023
Implicit Temporal Modeling with Learnable Alignment for Video
  Recognition
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
25
35
0
20 Apr 2023
A Large-scale Study of Spatiotemporal Representation Learning with a New
  Benchmark on Action Recognition
A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition
Andong Deng
Taojiannan Yang
C. L. P. Chen
AI4TS
22
12
0
23 Mar 2023
AdaptFormer: Adapting Vision Transformers for Scalable Visual
  Recognition
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
141
631
0
26 May 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
222
0
20 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Differentially Private Fine-tuning of Language Models
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
134
344
0
13 Oct 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video
  Representations
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari
Yi Zhu
Peter V. Gehler
Thomas Brox
117
122
0
30 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
360
0
17 Sep 2021
Previous
123
Next