Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.01403
Cited By
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
2 October 2023
Size Wu
Wenwei Zhang
Lumin Xu
Sheng Jin
Xiangtai Li
Wentao Liu
Chen Change Loy
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction"
19 / 19 papers shown
Title
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
42
0
0
08 May 2025
Split Matching for Inductive Zero-shot Semantic Segmentation
Jialei Chen
Xu Zheng
Dongyue Li
Chong Yi
Seigo Ito
D. Paudel
Luc Van Gool
Hiroshi Murase
Daisuke Deguchi
VLM
50
0
0
08 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Y. Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu
Kaicheng Yang
J. Z. Wang
Haoran Xu
Ziyong Feng
Y. Wang
VLM
71
0
0
23 Apr 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjD
VLM
96
0
0
14 Mar 2025
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
H. Zhang
X. Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Y. Yang
Zhe Gan
CLIP
VLM
65
7
0
20 Feb 2025
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
Xiangyu Gao
Yu Dai
Benliu Qiu
Hongliang Li
Heqian Qiu
Hongliang Li
ObjD
VLM
82
0
0
28 Jan 2025
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning
Zhifang Zhang
Shuo He
Bingquan Shen
Lei Feng
Lei Feng
AAML
38
0
0
29 Dec 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
30
3
0
21 Oct 2024
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
33
1
0
10 Sep 2024
CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding
Guibiao Liao
Jiankun Li
Zhenyu Bao
Xiaoqing Ye
Jingdong Wang
Qing Li
Kanglin Liu
3DGS
33
13
0
22 Apr 2024
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
Sina Hajimiri
Ismail Ben Ayed
Jose Dolz
VLM
31
22
0
12 Apr 2024
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie
Julong Wei
Zhengjun Wang
Ke Wu
Shansuai Yuan
Kaizhao Zhang
Jie Jia
Jieru Zhao
Zhongxue Gan
Wenchao Ding
35
7
0
10 Apr 2024
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Luting Wang
Yi Liu
Penghui Du
Zihan Ding
Yue Liao
Qiaosong Qi
Biaolong Chen
Si Liu
ObjD
VLM
68
61
0
10 Mar 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
209
318
0
08 Mar 2023
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation
Yue Han
Jiangning Zhang
Zhucun Xue
Chao Xu
Xintian Shen
Yabiao Wang
Chengjie Wang
Yong Liu
Xiangtai Li
27
16
0
03 Jan 2023
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
223
897
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
1