ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.03024
  4. Cited By
AIM: Adapting Image Models for Efficient Video Action Recognition

AIM: Adapting Image Models for Efficient Video Action Recognition

6 February 2023
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
    ViT
ArXivPDFHTML

Papers citing "AIM: Adapting Image Models for Efficient Video Action Recognition"

50 / 105 papers shown
Title
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Congqi Cao
Peiheng Han
Y. Zhang
Yating Yu
Qinyi Lv
Lingtong Min
Yanning Zhang
VLM
30
0
0
09 May 2025
Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer
Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer
Wenxuan Liu
X. Zhong
Zhuo Zhou
S. Yang
Chia-Wen Lin
Alex Chichung Kot
32
0
0
29 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
50
0
0
01 Apr 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
34
0
0
31 Mar 2025
Adaptive Wavelet Filters as Practical Texture Feature Amplifiers for Parkinson's Disease Screening in OCT
Adaptive Wavelet Filters as Practical Texture Feature Amplifiers for Parkinson's Disease Screening in OCT
X. Zhang
Hanfeng Shi
X. Li
Haili Ye
Tao Xu
Na Li
Yan Hu
Fan Lv
J. Chen
Jiang Liu
42
0
0
25 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
48
0
0
24 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
37
0
0
04 Mar 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
41
0
0
27 Feb 2025
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding
Luoying Hao
Yan Hu
Yang Yue
Li Wu
Huazhu Fu
Jinming Duan
Jiang Liu
59
0
0
24 Feb 2025
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
Feng Lu
Tong Jin
X. Lan
Lijun Zhang
Yunpeng Liu
Yaowei Wang
Chun Yuan
31
0
0
23 Feb 2025
Self-Prompt SAM: Medical Image Segmentation via Automatic Prompt SAM Adaptation
Self-Prompt SAM: Medical Image Segmentation via Automatic Prompt SAM Adaptation
Bin Xie
Hao Tang
Dawen Cai
Yan Yan
Gady Agam
MedIm
VLM
50
1
0
02 Feb 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
37
6
0
23 Jan 2025
Extending Video Masked Autoencoders to 128 frames
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
72
1
0
20 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
75
0
0
20 Nov 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
41
0
0
18 Nov 2024
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets
  for Sound Event Localization and Detection
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
31
0
0
10 Nov 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
54
1
0
30 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language
  to Video Knowledge Transfer
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
20
1
0
14 Oct 2024
SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning
  for Surgical Phase Recognition
SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning for Surgical Phase Recognition
Shu Yang
Zhiyuan Cai
Luyang Luo
Ning Ma
Shuchang Xu
Hao Chen
16
0
0
30 Sep 2024
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task
Lingyun Huang
Jianxu Mao
Yaonan Wang
Junfei Yi
Ziming Tao
VLM
VPVLM
35
1
0
27 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
27
1
0
13 Aug 2024
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal
  Omni-Scale Feature Learning
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
Mushui Liu
Bozheng Li
Yunlong Yu
VLM
16
1
0
12 Aug 2024
Efficient Test-Time Prompt Tuning for Vision-Language Models
Efficient Test-Time Prompt Tuning for Vision-Language Models
Yuhan Zhu
Guozhen Zhang
Chen Xu
Haocheng Shen
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
27
2
0
11 Aug 2024
GAReT: Cross-view Video Geolocalization with Adapters and
  Auto-Regressive Transformers
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
Manu S. Pillai
Mamshad Nayeem Rizve
M. Shah
30
2
0
05 Aug 2024
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot
  Action Recognition
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition
Congqi Cao
Guibiao Liao
Yating Yu
Kanglin Liu
Lingtong Min
Yanning Zhang
30
3
0
01 Aug 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
24
1
0
30 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
44
1
0
09 Jul 2024
CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community
CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community
Yan Liu
Bin Guo
Nuo Li
Yasan Ding
Zhouyangzi Zhang
Zhiwen Yu
30
1
0
09 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional
  Action Recognition
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
45
5
0
08 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting,
  and Transportation
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
39
7
0
05 Jul 2024
ASteISR: Adapting Single Image Super-resolution Pre-trained Model for
  Efficient Stereo Image Super-resolution
ASteISR: Adapting Single Image Super-resolution Pre-trained Model for Efficient Stereo Image Super-resolution
Yuanbo Zhou
Yuyang Xue
Wei Deng
Xinlin Zhang
Qinquan Gao
Tong Tong
37
0
0
04 Jul 2024
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression
  Recognition with AdaptERs
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs
Haodong Chen
Haojian Huang
Junhao Dong
Mingzhe Zheng
Dian Shao
40
15
0
02 Jul 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
41
3
0
20 Jun 2024
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action
  Localization
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization
Akshita Gupta
Gaurav Mittal
Ahmed Magooda
Ye Yu
Graham W. Taylor
Mei Chen
44
2
0
01 Apr 2024
Rethinking Attention-Based Multiple Instance Learning for Whole-Slide
  Pathological Image Classification: An Instance Attribute Viewpoint
Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint
Linghan Cai
Shenjin Huang
Ye Zhang
Jinpeng Lu
Yongbing Zhang
23
1
0
30 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
136
301
0
21 Mar 2024
MaskSAM: Towards Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
MaskSAM: Towards Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
Bin Xie
Hao Tang
Bin Duan
Dawen Cai
Yan Yan
Gady Agam
VLM
MedIm
23
0
0
21 Mar 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained
  Models for Spatiotemporal Modeling
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLM
VLM
26
0
0
11 Mar 2024
CricaVPR: Cross-image Correlation-aware Representation Learning for
  Visual Place Recognition
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu
Xiangyuan Lan
Lijun Zhang
Dongmei Jiang
Yaowei Wang
Chun Yuan
36
29
0
29 Feb 2024
Towards Seamless Adaptation of Pre-trained Models for Visual Place
  Recognition
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition
Feng Lu
Lijun Zhang
Xiangyuan Lan
Shuting Dong
Yaowei Wang
Chun Yuan
40
28
0
22 Feb 2024
Only the Curve Shape Matters: Training Foundation Models for Zero-Shot
  Multivariate Time Series Forecasting through Next Curve Shape Prediction
Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape Prediction
Cheng Feng
Long Huang
Denis Krompass
AI4TS
8
5
0
12 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
92
22
0
08 Feb 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action
  Recognition
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
Xiaohui Huang
Hao Zhou
Kun Yao
Kai Han
VLM
49
19
0
05 Feb 2024
Region-Based Representations Revisited
Region-Based Representations Revisited
Michal Shlapentokh-Rothman
Ansel Blume
Yao Xiao
Yuqun Wu
TV Sethuraman
Heyi Tao
Jae Yong Lee
Wilfredo Torres
Yu-xiong Wang
Derek Hoiem
32
5
0
04 Feb 2024
Learning Semantic Proxies from Visual Prompts for Parameter-Efficient
  Fine-Tuning in Deep Metric Learning
Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning
Li Ren
Chen Chen
Liqiang Wang
Kien Hua
16
4
0
04 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
56
71
0
03 Feb 2024
PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation
PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation
Zhaozhi Xie
Bochen Guan
Weihao Jiang
Muyang Yi
Yue Ding
Hongtao Lu
Lei Zhang
VLM
25
12
0
23 Jan 2024
Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical
  Vision Foundation Models
Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models
Chenyu Lian
Hong-Yu Zhou
Yizhou Yu
Liansheng Wang
MedIm
24
6
0
22 Jan 2024
AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks
AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks
Yun Liang
Hai Lin
Shaojian Qiu
Yihang Zhang
11
1
0
19 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
14
5
0
18 Jan 2024
123
Next