ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.02399
  4. Cited By
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

4 December 2021
Longtian Qiu
Renrui Zhang
Ziyu Guo
Wei Zhang
Zilu Guo
Ziyao Zeng
Guangnan Zhang
    VLM
    CLIP
ArXivPDFHTML

Papers citing "VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts"

43 / 43 papers shown
Title
Logits DeConfusion with CLIP for Few-Shot Learning
Logits DeConfusion with CLIP for Few-Shot Learning
Shuo Li
F. Liu
Zehua Hao
X. Wang
Lingling Li
X. Liu
Puhua Chen
Wenping Ma
VLM
47
0
0
16 Apr 2025
Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot
  Classification with CLIP
Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP
Yayuan Li
Jintao Guo
Lei Qi
Wenbin Li
Yinghuan Shi
VLM
CLIP
74
0
0
16 Dec 2024
PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Ziyao Zeng
Jingcheng Ni
Daniel Wang
Patrick Rim
Younjoon Chung
Fengyu Yang
Byung-Woo Hong
A. Wong
DiffM
MDE
98
2
0
24 Nov 2024
RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through
  Language Descriptions
RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Ziyao Zeng
Yangchao Wu
Hyoungseob Park
Daniel Wang
Fengyu Yang
Stefano Soatto
Dong Lao
Byung-Woo Hong
Alex Wong
MDE
16
7
0
03 Oct 2024
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label
  Image Classification
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification
Jiexuan Yan
Sheng Huang
Nankun Mu
Luwen Huangfu
Bo Liu
VLM
19
0
0
15 Aug 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
31
9
0
19 Jul 2024
NODE-Adapter: Neural Ordinary Differential Equations for Better
  Vision-Language Reasoning
NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning
Yi Zhang
Chun-Wun Cheng
Ke Yu
Zhihai He
Carola-Bibiane Schonlieb
Angelica I Aviles-Rivero
VLM
39
2
0
11 Jul 2024
Conceptual Codebook Learning for Vision-Language Models
Conceptual Codebook Learning for Vision-Language Models
Yi Zhang
Ke Yu
Siqi Wu
Zhihai He
VLM
34
2
0
02 Jul 2024
Robust Pedestrian Detection via Constructing Versatile Pedestrian
  Knowledge Bank
Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Sungjune Park
Hyunjun Kim
Y. Ro
18
12
0
30 Apr 2024
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
Yuwei Tang
Zhenyi Lin
Qilong Wang
Pengfei Zhu
Qinghua Hu
26
11
0
13 Apr 2024
WorDepth: Variational Language Prior for Monocular Depth Estimation
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng
Daniel Wang
Fengyu Yang
Hyoungseob Park
Yangchao Wu
Stefano Soatto
Byung-Woo Hong
Dong Lao
Alex Wong
MDE
38
26
0
04 Apr 2024
Bayesian Exploration of Pre-trained Models for Low-shot Image
  Classification
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao
Yu Lei
Feng Zhou
Zhijie Deng
VLM
UQCV
BDL
38
1
0
30 Mar 2024
CLAP4CLIP: Continual Learning with Probabilistic Finetuning for
  Vision-Language Models
CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models
Saurav Jha
Dong Gong
Lina Yao
CLIP
VLM
33
7
0
28 Mar 2024
CLIPose: Category-Level Object Pose Estimation with Pre-trained
  Vision-Language Knowledge
CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
Xiao Lin
Minghao Zhu
Ronghao Dang
Guangliang Zhou
Shaolong Shu
Feng Lin
Chengju Liu
Qi Chen
CLIP
35
7
0
24 Feb 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile
  Representations
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
36
53
0
31 Jan 2024
Concept-Guided Prompt Learning for Generalization in Vision-Language
  Models
Concept-Guided Prompt Learning for Generalization in Vision-Language Models
Yi Zhang
Ce Zhang
Ke Yu
Yushun Tang
Zhihai He
VLM
MLLM
32
20
0
15 Jan 2024
Learning to Prompt Segment Anything Models
Learning to Prompt Segment Anything Models
Jiaxing Huang
Kai Jiang
Jingyi Zhang
Han Qiu
Lewei Lu
Shijian Lu
Eric P. Xing
VLM
LRM
40
7
0
09 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
20
22
0
03 Jan 2024
Domain Aligned CLIP for Few-shot Classification
Domain Aligned CLIP for Few-shot Classification
Muhammad Waleed Gondal
Jochen Gast
Inigo Alonso Ruiz
Richard Droste
Tommaso Macri
Suren Kumar
Luitpold Staudigl
VLM
11
11
0
15 Nov 2023
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Xue-mei Hu
Ce Zhang
Yi Zhang
Bowen Hai
Ke Yu
Zhihai He
MDE
VLM
22
17
0
02 Nov 2023
Integrating Language-Derived Appearance Elements with Visual Cues in
  Pedestrian Detection
Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection
Sungjune Park
Hyunjun Kim
Y. Ro
37
11
0
02 Nov 2023
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow
  Linguistic Reasoning
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
27
2
0
30 Oct 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
19
18
0
14 Sep 2023
Read-only Prompt Optimization for Vision-Language Few-shot Learning
Read-only Prompt Optimization for Vision-Language Few-shot Learning
Dongjun Lee
Seokwon Song
Jihee G. Suh
Joonmyeong Choi
S. Lee
Hyunwoo J.Kim
VLM
34
41
0
29 Aug 2023
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Guangyi Chen
Xiao Liu
Guangrun Wang
Kun Zhang
Philip H.S.Torr
Xiaoping Zhang
Yansong Tang
19
18
0
16 Aug 2023
Cross-Modal Concept Learning and Inference for Vision-Language Models
Cross-Modal Concept Learning and Inference for Vision-Language Models
Yi Zhang
Ce Zhang
Yushun Tang
Z. He
VLM
MLLM
CLIP
20
15
0
28 Jul 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot
  Vision-Language Tasks
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
13
4
0
07 Jun 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
  Refinement
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
Xiang-yu Zhu
Renrui Zhang
Bowei He
A-Long Zhou
Dong Wang
Bingyan Zhao
Peng Gao
VLM
27
79
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
34
474
0
03 Apr 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with
  GPT and Prototype Guidance
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
23
53
0
29 Mar 2023
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong
  Few-shot Learners
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Renrui Zhang
Xiangfei Hu
Bohao Li
Siyuan Huang
Hanqiu Deng
Hongsheng Li
Yu Qiao
Peng Gao
VLM
MLLM
25
170
0
03 Mar 2023
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
  Pre-training
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzhi Li
Pheng-Ann Heng
3DPC
20
52
0
27 Feb 2023
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models
Vishaal Udandarao
Ankush Gupta
Samuel Albanie
VLM
MLLM
22
103
0
28 Nov 2022
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
  Few-shot Image Classification
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification
Fang Peng
Xiaoshan Yang
Linhui Xiao
Yaowei Wang
Changsheng Xu
VLM
22
42
0
28 Nov 2022
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
Guangyi Chen
Weiran Yao
Xiangchen Song
Xinyue Li
Yongming Rao
Kun Zhang
VPVLM
VLM
6
62
0
03 Oct 2022
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzheng Ma
Xupeng Miao
Xuming He
Bin Cui
VLM
AAML
55
109
0
28 Sep 2022
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
  Models
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
Manli Shu
Weili Nie
De-An Huang
Zhiding Yu
Tom Goldstein
Anima Anandkumar
Chaowei Xiao
VLM
VPVLM
175
279
0
15 Sep 2022
Can Language Understand Depth?
Can Language Understand Depth?
Renrui Zhang
Ziyao Zeng
Ziyu Guo
Yafeng Li
VLM
MDE
13
71
0
03 Jul 2022
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,835
0
18 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Making Pre-trained Language Models Better Few-shot Learners
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,913
0
31 Dec 2020
Cross Attention Network for Few-shot Classification
Cross Attention Network for Few-shot Classification
Rui Hou
Hong Chang
Bingpeng Ma
Shiguang Shan
Xilin Chen
202
629
0
17 Oct 2019
1