Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03818
Cited By
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
6 December 2023
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Alpha-CLIP: A CLIP Model Focusing on Wherever You Want"
17 / 67 papers shown
Title
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
41
1
0
01 Mar 2024
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai
Qingsong Yao
Zihang Jiang
Rongsheng Wang
Zhiyang He
Xiaodong Tao
S. Kevin Zhou
MedIm
18
3
0
27 Feb 2024
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
Yuhang Cao
Pan Zhang
Xiao-wen Dong
Dahua Lin
Jiaqi Wang
27
10
0
22 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
28
9
0
20 Feb 2024
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao
Yue Liu
Zonghao Guo
Weijia Wu
Chen Gong
Fang Wan
QiXiang Ye
21
5
0
31 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
73
89
0
29 Jan 2024
Tokenize Anything via Prompting
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
18
22
0
14 Dec 2023
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang
Shengju Qian
Bohao Peng
Shu-Lin Liu
Jiaya Jia
MLLM
22
19
0
07 Dec 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang
Fachrina Dewi Puspitasari
Sheng Zheng
Chenghao Li
Yu Qiao
...
Caiyan Qin
François Rameau
Lik-Hang Lee
Sung-Ho Bae
Choong Seon Hong
VLM
76
61
0
12 May 2023
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Jing Shi
Wei Xiong
Zhe-nan Lin
H. J. Jung
DiffM
113
272
0
06 Apr 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
198
318
0
08 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Jiale Xu
Xintao Wang
Weihao Cheng
Yan-Pei Cao
Ying Shan
Xiaohu Qie
Shenghua Gao
169
161
0
28 Dec 2022
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
186
218
0
24 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
206
698
0
28 Apr 2021
Previous
1
2