Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2208.12262
Cited By
v1
v2 (latest)
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Computer Vision and Pattern Recognition (CVPR), 2022
25 August 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
Hao Yang
Ming Zeng
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (35★)
Papers citing
"MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining"
42 / 142 papers shown
Title
Improving Compositional Text-to-image Generation with Large Vision-Language Models
Song Wen
Guian Fang
Renrui Zhang
Shiyang Feng
Hao Dong
Dimitris N. Metaxas
192
23
0
10 Oct 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
IEEE International Conference on Computer Vision (ICCV), 2023
Chengyang Zhao
Songlin Yang
Zhenfang Chen
Mingyu Ding
Chuang Gan
337
23
0
10 Oct 2023
ALT-Pilot: Autonomous navigation with Language augmented Topometric maps
Mohammad Omama
Pranav Inani
Pranjal Paul
Sarat Chandra Yellapragada
Krishna Murthy Jatavallabhula
Sandeep Chinchali
Madhava Krishna
125
20
0
03 Oct 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
286
37
0
02 Sep 2023
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Minheng Ni
Yabo Zhang
Kailai Feng
Xiaoming Li
Yiwen Guo
W. Zuo
DiffM
285
33
0
31 Aug 2023
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting
IEEE International Conference on Computer Vision (ICCV), 2023
Qidong Huang
Xiaoyi Dong
DongDong Chen
Yinpeng Chen
Lu Yuan
Gang Hua
Weiming Zhang
Neng H. Yu
AAML
274
11
0
20 Aug 2023
Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language
Francesco Taioli
Federico Cunico
Federico Girella
Riccardo Bologna
Alessandro Farinelli
Marco Cristani
173
8
0
17 Aug 2023
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Kaixin Cai
Pengzhen Ren
Yi Zhu
Hang Xu
Jian-zhuo Liu
Changlin Li
Guangrun Wang
Xiaodan Liang
VLM
144
19
0
09 Aug 2023
Unsupervised Camouflaged Object Segmentation as Domain Adaptation
Yi Zhang
Chengyi Wu
192
7
0
08 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
408
150
0
25 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
Computer Vision and Pattern Recognition (CVPR), 2023
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
309
71
0
24 Jul 2023
Unified Open-Vocabulary Dense Visual Prediction
IEEE transactions on multimedia (IEEE TMM), 2023
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
VLM
171
53
0
17 Jul 2023
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
IEEE International Conference on Computer Vision (ICCV), 2023
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Zhuowen Tu
Haoran Su
VLM
311
40
0
06 Jul 2023
Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification
Neural Information Processing Systems (NeurIPS), 2023
Rui Wang
Peipei Li
Huaibo Huang
Chunshui Cao
Xiao-Yu Zhang
Zhaofeng He
220
18
0
24 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
International Conference on Learning Representations (ICLR), 2023
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
243
39
0
12 Jun 2023
COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation
Jia-Qi Yang
Chen Dai
OU Dan
Dongshuai Li
Ju Huang
De-Chuan Zhan
Xiaoyi Zeng
Yang Yang
253
5
0
08 Jun 2023
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Jun Chen
Deyao Zhu
Guocheng Qian
Guohao Li
Zhicheng Yan
Chenchen Zhu
Fanyi Xiao
Mohamed Elhoseiny
Sean Culatana
VLM
208
12
0
01 Jun 2023
Album Storytelling with Iterative Story-aware Captioning and Large Language Models
Munan Ning
Yujia Xie
Dongdong Chen
Zeyin Song
Lu Yuan
Yonghong Tian
QiXiang Ye
Liuliang Yuan
132
9
0
22 May 2023
MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval
Bhanu Prakash Voutharoja
Peng Wang
Lei Wang
Vivienne Guan
135
6
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
439
150
0
18 May 2023
Improved baselines for vision-language pre-training
Enrico Fini
Pietro Astolfi
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
SSL
CLIP
VLM
323
26
0
15 May 2023
CLIP-S
4
^4
4
: Language-Guided Self-Supervised Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Wenbin He
Suphanut Jamonnak
Liangke Gou
Liu Ren
VLM
311
45
0
01 May 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLM
CLIP
221
16
0
12 Apr 2023
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
IEEE International Conference on Computer Vision (ICCV), 2023
Ahmed Abdelreheem
Ivan Skorokhodov
M. Ovsjanikov
Peter Wonka
3DPC
278
60
0
11 Apr 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
IEEE International Conference on Computer Vision (ICCV), 2023
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
298
34
0
31 Mar 2023
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
IEEE International Conference on Computer Vision (ICCV), 2023
Lingdong Kong
You-Chen Liu
Xin Li
Runnan Chen
Wenwei Zhang
Jiawei Ren
Liang Pan
Kaili Chen
Ziwei Liu
355
127
0
30 Mar 2023
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models
Sifan Long
Zhen Zhao
Junkun Yuan
Zichang Tan
Jiangjiang Liu
Luping Zhou
Sheng-sheng Wang
Jingdong Wang
VLM
270
5
0
30 Mar 2023
Supervised Masked Knowledge Distillation for Few-Shot Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Hanxi Lin
G. Han
Jiawei Ma
Shiyuan Huang
Xudong Lin
Shih-Fu Chang
324
48
0
25 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Computer Vision and Pattern Recognition (CVPR), 2023
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
169
44
0
04 Mar 2023
Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Xinyu Jiao
Yunlong Wang
Jiusi Li
Zelin Qian
Jun Wen
Mengmeng Yang
Ke Wang
Diange Yang
326
43
0
02 Mar 2023
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
International Conference on Learning Representations (ICLR), 2023
Pengzhen Ren
Changlin Li
Hang Xu
Yi Zhu
Guangrun Wang
Jian-zhuo Liu
Xiaojun Chang
Xiaodan Liang
189
57
0
31 Jan 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023
Liya Wang
A. Tien
350
16
0
28 Jan 2023
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Computer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
219
77
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Computer Vision and Pattern Recognition (CVPR), 2023
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
136
14
0
17 Jan 2023
Attentive Mask CLIP
IEEE International Conference on Computer Vision (ICCV), 2022
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
151
32
0
16 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
137
46
0
12 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Xiyang Dai
Lu Yuan
Yu-Gang Jiang
VGen
257
117
0
08 Dec 2022
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
International Conference on Machine Learning (ICML), 2022
Hanqing Zhao
Dianmo Sheng
Jianmin Bao
Dongdong Chen
Dong Chen
...
Ce Liu
Wenbo Zhou
Qi Chu
Weiming Zhang
Neng H. Yu
VLM
DiffM
205
59
0
07 Dec 2022
Scaling Language-Image Pre-training via Masking
Computer Vision and Pattern Recognition (CVPR), 2022
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
334
382
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Computer Vision and Pattern Recognition (CVPR), 2022
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
115
18
0
29 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Computer Vision and Pattern Recognition (CVPR), 2022
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
189
54
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLM
CLIP
248
25
0
17 Nov 2022
Previous
1
2
3