Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03818
Cited By
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
6 December 2023
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Alpha-CLIP: A CLIP Model Focusing on Wherever You Want"
50 / 67 papers shown
Title
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
34
0
0
08 May 2025
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Gabriel Sarch
Balasaravanan Thoravi Kumaravel
Sahithya Ravi
Vibhav Vineet
A. D. Wilson
34
0
0
02 May 2025
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
Xiaofeng Jin
Matteo Frosi
Matteo Matteucci
44
0
0
27 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
69
0
0
20 Apr 2025
MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation
Nico Catalano
Stefano Samele
Paolo Pertino
Matteo Matteucci
3DPC
40
0
0
10 Apr 2025
Are We Done with Object-Centric Learning?
Alexander Rubinstein
Ameya Prabhu
Matthias Bethge
Seong Joon Oh
OCL
513
0
0
09 Apr 2025
URECA: Unique Region Caption Anything
Sangbeom Lim
J. Kim
Heeji Yoon
Jaewoo Jung
Seungryong Kim
26
0
0
07 Apr 2025
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Y. Lu
Sifei Liu
...
Jan Kautz
Song Han
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
44
0
0
25 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
40
1
0
21 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Sara Sarto
Marcella Cornia
Rita Cucchiara
41
0
0
18 Mar 2025
Dynamic Relation Inference via Verb Embeddings
Omri Suissa
Muhiim Ali
Ariana Azarbal
Hui Shen
Shekhar Pradhan
31
0
0
17 Mar 2025
ACMo: Attribute Controllable Motion Generation
Mingjie Wei
Xuemei Xie
G. Shi
52
0
0
14 Mar 2025
Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
Mihcael Green
Matan Levy
Issar Tzachor
Dvir Samuel
N. Darshan
Rami Ben-Ari
49
0
0
10 Mar 2025
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
Suhwan Cho
Seunghoon Lee
Minhyeok Lee
Jungho Lee
Sangyoun Lee
VOS
77
0
0
05 Mar 2025
A Zero-Shot Learning Approach for Ephemeral Gully Detection from Remote Sensing using Vision Language Models
Seyed Mohamad Ali Tousi
Ramy M. A. Farag
Jacket Demby's
Gbenga Omotara
John A. Lory
Guilherme N. DeSouza
47
0
0
03 Mar 2025
Solving Instance Detection from an Open-World Perspective
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
32
0
0
01 Mar 2025
Open-Vocabulary Semantic Part Segmentation of 3D Human
Keito Suzuki
Bang Du
Girish Krishnan
Kunyao Chen
Runfa Li
Truong Thao Nguyen
3DH
VLM
89
0
0
27 Feb 2025
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
H. Zhang
X. Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Y. Yang
Zhe Gan
CLIP
VLM
53
7
0
20 Feb 2025
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou
Amine Ouasfi
Vincent Gripon
A. Boukhayma
VLM
38
0
0
19 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
40
0
0
19 Jan 2025
How Panel Layouts Define Manga: Insights from Visual Ablation Experiments
Siyuan Feng
Teruya Yoshinaga
Katsuhiko Hayashi
Koki Washio
Hidetaka Kamigaito
28
0
0
26 Dec 2024
MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context
Shuai Lyu
Fangjian Liao
Zeqi Ma
Rongchen Zhang
Dongmei Mo
W. Wong
66
0
0
22 Dec 2024
Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition
Ethan Baron
Idan Tankel
Peter Tu
Guy Ben-Yosef
VLM
67
0
0
18 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
X. Zhang
K. Chen
Yu Qiao
D. Lin
Jiaqi Wang
KELM
78
12
0
12 Dec 2024
Detailed Object Description with Controllable Dimensions
Xinran Wang
H. Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Z. Ma
Jun Guo
78
0
0
28 Nov 2024
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua
Qing Liu
Lingzhi Zhang
Jing Shi
Zhifei Zhang
Yilin Wang
Jianming Zhang
Jiebo Luo
CoGe
VLM
81
6
0
23 Nov 2024
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
Xuechen Guo
Wenhao Chai
Shi-Yan Li
Gaoang Wang
13
5
0
19 Oct 2024
A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem
Kun Ding
Ying Wang
Gaofeng Meng
Shiming Xiang
VLM
24
0
0
15 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
23
0
0
09 Oct 2024
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Youngtaek Oh
Jae-Won Cho
Dong-Jin Kim
In So Kweon
Junmo Kim
VLM
CoGe
CLIP
9
4
0
07 Oct 2024
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
Jiapeng Wang
Chengyu Wang
Kunzhe Huang
Jun Huang
Lianwen Jin
CLIP
VLM
16
3
0
01 Oct 2024
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
Nghia Nguyen
Minh Nhat Vu
Tung D. Ta
Baoru Huang
T. Vo
Ngan Le
Anh Nguyen
VLM
CLIP
27
3
0
26 Sep 2024
Click2Mask: Local Editing with Dynamic Mask Generation
Omer Regev
Omri Avrahami
Dani Lischinski
DiffM
32
1
0
12 Sep 2024
MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model
Xinyang Wang
Yi Yang
Minfeng Zhu
Kecheng Zheng
Shi Liu
Wei Chen
VPVLM
MLLM
VLM
26
1
0
19 Aug 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
40
7
0
31 Jul 2024
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
35
6
0
29 Jul 2024
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Walid Bousselham
Sofian Chaybouti
Christian Rupprecht
Vittorio Ferrari
Hilde Kuehne
46
0
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
31
3
0
26 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIP
VLM
21
2
0
08 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
40
98
0
03 Jul 2024
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective
Zhaotian Weng
Zijun Gao
Jerone Andrews
Jieyu Zhao
20
0
0
03 Jul 2024
CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models
Yigit Ekin
Ahmet Burak Yildirim
Erdem Çağlar
Aykut Erdem
Erkut Erdem
Aysegül Dündar
DiffM
22
0
0
13 Jun 2024
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
S. Linok
T. Zemskova
Svetlana Ladanova
Roman Titkov
Dmitry A. Yudin
Maxim Monastyrny
Aleksei Valenkov
LM&Ro
30
3
0
11 Jun 2024
Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection
Koji Takeda
Kanji Tanaka
Yoshimasa Nakamura
Asako Kanezaki
43
0
0
10 May 2024
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
Bozhi Luan
Hao Feng
Hong Chen
Yonghui Wang
Wen-gang Zhou
Houqiang Li
MLLM
16
2
0
15 Apr 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Xingcheng Zhang
Jifeng Dai
Yuxin Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
23
107
0
09 Apr 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
14
7
0
01 Apr 2024
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang
Pan Zhang
Xiao-wen Dong
Yuhang Zang
Jiaqi Wang
CLIP
VLM
20
106
0
22 Mar 2024
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar
Pekka Marttinen
MedIm
VLM
23
9
0
15 Mar 2024
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
45
6
0
14 Mar 2024
1
2
Next