ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.03818
  4. Cited By
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

6 December 2023
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Alpha-CLIP: A CLIP Model Focusing on Wherever You Want"

17 / 67 papers shown
Title
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
41
1
0
01 Mar 2024
CARZero: Cross-Attention Alignment for Radiology Zero-Shot
  Classification
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai
Qingsong Yao
Zihang Jiang
Rongsheng Wang
Zhiyang He
Xiaodong Tao
S. Kevin Zhou
MedIm
18
3
0
27 Feb 2024
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large
  Language Models
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
Yuhang Cao
Pan Zhang
Xiao-wen Dong
Dahua Lin
Jiaqi Wang
27
10
0
22 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current
  Methodologies and Future Directions
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
28
9
0
20 Feb 2024
ControlCap: Controllable Region-level Captioning
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao
Yue Liu
Zonghao Guo
Weijia Wu
Chen Gong
Fang Wan
QiXiang Ye
21
5
0
31 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
73
89
0
29 Jan 2024
Tokenize Anything via Prompting
Tokenize Anything via Prompting
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
18
22
0
14 Dec 2023
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang
Shengju Qian
Bohao Peng
Shu-Lin Liu
Jiaya Jia
MLLM
22
19
0
07 Dec 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets
  Prompt Engineering
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang
Fachrina Dewi Puspitasari
Sheng Zheng
Chenghao Li
Yu Qiao
...
Caiyan Qin
François Rameau
Lik-Hang Lee
Sung-Ho Bae
Choong Seon Hong
VLM
76
61
0
12 May 2023
InstantBooth: Personalized Text-to-Image Generation without Test-Time
  Finetuning
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Jing Shi
Wei Xiong
Zhe-nan Lin
H. J. Jung
DiffM
113
272
0
06 Apr 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion
  Models
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
198
318
0
08 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
  Text-to-Image Diffusion Models
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Jiale Xu
Xintao Wang
Weihao Cheng
Yan-Pei Cao
Ying Shan
Xiaohu Qie
Shenghua Gao
169
161
0
28 Dec 2022
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
186
218
0
24 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
206
698
0
28 Apr 2021
Previous
12