Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2208.12262
Cited By
v1
v2 (latest)
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Computer Vision and Pattern Recognition (CVPR), 2022
25 August 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
Hao Yang
Ming Zeng
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (35★)
Papers citing
"MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining"
50 / 142 papers shown
Title
Locality Alignment Improves Vision-Language Models
International Conference on Learning Representations (ICLR), 2024
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
525
11
0
14 Oct 2024
Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques
Neural Information Processing Systems (NeurIPS), 2024
Benyuan Meng
Qianqian Xu
Zitai Wang
Zhiyong Yang
Xiaochun Cao
Qingming Huang
438
0
0
09 Oct 2024
Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features
Neural Information Processing Systems (NeurIPS), 2024
Benyuan Meng
Qianqian Xu
Zitai Wang
Xiaochun Cao
Qingming Huang
351
16
0
04 Oct 2024
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
IEEE International Conference on Robotics and Automation (ICRA), 2024
Nghia Nguyen
Minh Nhat Vu
Tung D. Ta
Baoru Huang
T. Vo
Ngan Le
Anh Nguyen
VLM
CLIP
196
9
0
26 Sep 2024
Global-Local Medical SAM Adaptor Based on Full Adaption
Meng Wang
Yarong Feng
Yongwei Tang
Tian Zhang
Yuxin Liang
Chao Lv
MedIm
208
1
0
26 Sep 2024
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
Amin Karimi Monsefi
Kishore Prakash Sailaja
Ali Alilooee
Ser-Nam Lim
R. Ramnath
VLM
341
16
0
10 Sep 2024
Masked Image Modeling: A Survey
International Journal of Computer Vision (IJCV), 2024
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
Andrii Zadaianchuk
405
17
0
13 Aug 2024
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen
Xiaozhen Qiao
Zhe Sun
Xuelong Li
VLM
381
8
0
08 Aug 2024
Enrich the content of the image Using Context-Aware Copy Paste
Qiushi Guo
VLM
308
1
0
11 Jul 2024
GalLoP: Learning Global and Local Prompts for Vision-Language Models
Marc Lafon
Elias Ramzi
Clément Rambour
Nicolas Audebert
Nicolas Thome
VLM
262
34
0
01 Jul 2024
3D Feature Distillation with Object-Centric Priors
Georgios Tziafas
Yucheng Xu
Zhibin Li
Hamidreza Kasaei
325
1
0
26 Jun 2024
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
Thomas Stegmüller
Tim Lebailly
Nikola Dukic
Behzad Bozorgtabar
Tinne Tuytelaars
Jean-Philippe Thiran
VLM
375
3
0
23 Jun 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGen
CoGe
138
1
0
19 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
301
2
0
05 Jun 2024
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Yu Zhang
Tao Gui
Zixuan Gong
Yiwei Shi
Yepeng Liu
...
Ke Liu
Kun Yi
Wei Fan
Liang Hu
Changwei Wang
CLIP
VLM
250
7
0
03 Jun 2024
Cross-sensor self-supervised training and alignment for remote sensing
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
V. Marsocci
Nicolas Audebert
273
4
0
16 May 2024
Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
Engineering applications of artificial intelligence (EAAI), 2024
Sunyuan Qiang
Xianfei Li
Yanyan Liang
Wenlong Liao
Tao He
Pai Peng
ObjD
177
0
0
14 May 2024
OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images
Ye Mao
Junpeng Jing
K. Mikolajczyk
VLM
127
0
0
25 Apr 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
299
30
0
20 Apr 2024
Anchor-based Robust Finetuning of Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Jinwei Han
Zhiwen Lin
Zhongyi Sun
Yingguo Gao
Ke Yan
Shouhong Ding
Yuan Gao
Gui-Song Xia
VLM
185
10
0
09 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Computer Vision and Pattern Recognition (CVPR), 2024
Jienneg Chen
Qihang Yu
Xiaohui Shen
Yaoyao Liu
Liang-Chieh Chen
3DV
VLM
365
47
0
02 Apr 2024
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
IEEE International Joint Conference on Neural Network (IJCNN), 2024
Barbara Toniella Corradini
Mustafa Shukor
Paul Couairon
Guillaume Couairon
Franco Scarselli
Matthieu Cord
DiffM
VLM
383
10
0
29 Mar 2024
DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng
Yifei Zhang
Wei Wu
Fan Lu
Shuailei Ma
Xin Jin
Wei Chen
Yujun Shen
VLM
CLIP
275
61
0
25 Mar 2024
Centered Masking for Language-Image Pre-Training
Mingliang Liang
Martha Larson
VLM
CLIP
146
5
0
23 Mar 2024
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
IEEE Robotics and Automation Letters (RA-L), 2024
Yuhang Zheng
Xiangyu Chen
Yupeng Zheng
Songen Gu
Runyi Yang
...
Chao Yang
Dawei Wang
Zhen Chen
Xiaoxiao Long
Meiqing Wang
189
89
0
14 Mar 2024
FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Muhammad Gul Zain Ali Khan
Muhammad Ferjad Naeem
F. Tombari
Luc Van Gool
Didier Stricker
Muhammad Zeshan Afzal
VLM
CLIP
165
0
0
11 Mar 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
VLM
238
0
0
01 Mar 2024
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Zihang Jiang
Qingsong Yao
Zihang Jiang
Rongsheng Wang
Zhiyang He
Xiaodong Tao
S. Kevin Zhou
MedIm
243
32
0
27 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
IEEE Transactions on Intelligent Vehicles (TIV), 2024
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
327
26
0
05 Feb 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
International Conference on Learning Representations (ICLR), 2024
Yuhang Zang
Hanlin Goh
Josh Susskind
Chen Huang
VLM
264
16
0
29 Jan 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
373
25
0
16 Jan 2024
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
European Conference on Computer Vision (ECCV), 2024
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
160
12
0
12 Jan 2024
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
International Conference on Machine Learning (ICML), 2024
Ziping Ma
Furong Xu
Jian Liu
Ming Yang
Qingpei Guo
VLM
183
7
0
04 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
235
27
0
31 Dec 2023
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval
Zeqiang Wei
Kai Jin
Xiuzhuang Zhou
MedIm
284
8
0
26 Dec 2023
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Qinying Liu
Wei Wu
Kecheng Zheng
Zhan Tong
Jiawei Liu
Yu Liu
Wei Chen
Zilei Wang
Yujun Shen
VLM
301
7
0
21 Dec 2023
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis
Judith E. Fan
Yulia Gryaditskaya
VLM
3DV
274
1
0
18 Dec 2023
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Yasser Benigmim
Subhankar Roy
S. Essid
Vicky Kalogeiton
Stéphane Lathuilière
343
31
0
15 Dec 2023
CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen
AAAI Conference on Artificial Intelligence (AAAI), 2023
Hao Zhang
Fang Li
Lu Qi
Ming-Hsuan Yang
Narendra Ahuja
212
14
0
09 Dec 2023
Auto-Vocabulary Semantic Segmentation
Osman Ülger
Maksymilian Kulicki
Yuki M. Asano
Martin R. Oswald
VLM
325
4
0
07 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Yuan Liu
VLM
CLIP
331
161
0
06 Dec 2023
Novel class discovery meets foundation models for 3D semantic segmentation
Luigi Riz
Cristiano Saltori
Yiming Wang
Elisa Ricci
Fabio Poiesi
3DPC
281
1
0
06 Dec 2023
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
ObjD
VLM
291
71
0
01 Dec 2023
Segment and Caption Anything
Computer Vision and Pattern Recognition (CVPR), 2023
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
198
32
0
01 Dec 2023
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
European Conference on Computer Vision (ECCV), 2023
Xiaoxuan He
Yifan Yang
Xinyang Jiang
Xufang Luo
Haoji Hu
Siyun Zhao
Dongsheng Li
Yuqing Yang
Lili Qiu
309
5
0
24 Nov 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
231
17
0
24 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
238
55
0
20 Oct 2023
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
IEEE International Conference on Computer Vision (ICCV), 2023
Tianyi Wei
DongDong Chen
Wenbo Zhou
Jing Liao
Weiming Zhang
Gang Hua
Neng H. Yu
141
19
0
16 Oct 2023
Black-box Targeted Adversarial Attack on Segment Anything (SAM)
Sheng Zheng
Chaoning Zhang
Xinhong Hao
AAML
350
12
0
16 Oct 2023
Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images
Che Liu
Anand Shah
Wenjia Bai
Rossella Arcucci
MedIm
338
21
0
10 Oct 2023
Previous
1
2
3
Next