Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2208.12262
Cited By
v1
v2 (latest)
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Computer Vision and Pattern Recognition (CVPR), 2022
25 August 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
Hao Yang
Ming Zeng
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (35★)
Papers citing
"MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining"
50 / 142 papers shown
Title
PowerCLIP: Powerset Alignment for Contrastive Pre-Training
Masaki Kawamura
Nakamasa Inoue
Rintaro Yanagi
Hirokatsu Kataoka
Rio Yokota
CLIP
VLM
89
0
0
28 Nov 2025
SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
Lin Chen
Yingjian Zhu
Qi Yang
Xin Niu
Kun Ding
Shiming Xiang
VLM
113
0
0
25 Nov 2025
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs
ACM Multimedia (MM), 2024
Daiqing Wu
Dongbao Yang
Yu Zhou
Can Ma
70
4
0
21 Nov 2025
MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment
Bingyu Li
Feiyu Wang
Da Zhang
Zhiyuan Zhao
Junyu Gao
Xuelong Li
VLM
133
1
0
17 Oct 2025
Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
Wenyao Zhang
Hongsi Liu
Bohan Li
Jiawei He
Zekun Qi
Yunnan Wang
Shengyang Zhao
Xinqiang Yu
Wenjun Zeng
Jianfeng Dong
VLM
MDE
173
0
0
10 Oct 2025
KeySG: Hierarchical Keyframe-Based 3D Scene Graphs
Abdelrhman Werby
Dennis Rotondi
Fabio Scaparro
Kai O. Arras
3DV
106
0
0
01 Oct 2025
Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim
Heeseong Shin
Eunbeen Hong
Heeji Yoon
Anurag Arnab
Paul Hongsuck Seo
Sunghwan Hong
Seungryong Kim
172
5
0
22 Sep 2025
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Yuheng Shi
Xiaohuan Pei
Minjing Dong
Chang Xu
ObjD
241
0
0
21 Sep 2025
Informative Text-Image Alignment for Visual Affordance Learning with Foundation Models
Qian Zhang
Lin Zhang
Xing Fang
Mingxin Zhang
Zhiyuan Wei
Ran Song
Wei Zhang
116
0
0
21 Sep 2025
Synthetic Captions for Open-Vocabulary Zero-Shot Segmentation
Tim Lebailly
Vijay Veerabadran
Satwik Kottur
Karl Ridgeway
Michael L. Iuzzolino
VLM
79
0
0
15 Sep 2025
OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds
Chongyu Wang
Kunlei Jing
J. Zhu
Di Wang
3DPC
149
0
0
13 Sep 2025
Splat Feature Solver
Butian Xiong
Rong Liu
Kenneth Xu
Meida Chen
Andrew Feng
89
1
0
17 Aug 2025
Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation
Xu Tang
Junan Jia
Yijing Wang
Jingjing Ma
Xiangrong Zhang
123
0
0
13 Aug 2025
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation
Xusheng Liang
Lihua Zhou
Nianxin Li
Miao Xu
Ziyang Song
Dong Yi
Jinlin Wu
Hongbin Liu
Jiebo Luo
Zhen Lei
100
0
0
07 Aug 2025
Point2Act: Efficient 3D Distillation of Multimodal LLMs for Zero-Shot Context-Aware Grasping
Sang Min Kim
Hyeongjun Heo
J. Kim
Yonghyeon Lee
Young Min Kim
3DPC
82
0
0
05 Aug 2025
Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval
Shreyank N Gowda
Xiaobo Jin
Christian Wagner
MedIm
88
0
0
05 Aug 2025
MINR: Implicit Neural Representations with Masked Image Modelling
Sua Lee
Joonhun Lee
Myungjoo Kang
115
1
0
30 Jul 2025
FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
Bingchao Wang
Zhiwei Ning
Jianyu Ding
Xuanang Gao
Yin Li
Dongsheng Jiang
J. Yang
Wei Liu
CLIP
VLM
202
5
0
14 Jul 2025
OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
Shiting Xiao
Rishabh Kabra
Yuhang Li
Donghyun Lee
João Carreira
Priyadarshini Panda
VLM
243
0
0
07 Jul 2025
Multimodal Medical Image Binding via Shared Text Embeddings
Yunhao Liu
SuYang Xi
Shiqi Liu
Hong Ding
Chicheng Jin
Chong Zhong
Junjun He
Catherine C. Liu
Yiqing Shen
180
2
0
22 Jun 2025
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
Juan Yeo
S. Cha
Jiwoo Song
Hyunbin Jin
Taesup Kim
VLM
116
1
0
10 Jun 2025
SemNav: A Model-Based Planner for Zero-Shot Object Goal Navigation Using Vision-Foundation Models
Arnab Debnath
Gregory J. Stein
Jana Kosecka
LM&Ro
220
1
0
04 Jun 2025
VLCD: Vision-Language Contrastive Distillation for Accurate and Efficient Automatic Placenta Analysis
Manas Mehta
Yimu Pan
Kelly Gallagher
Alison D. Gernand
Jeffery A. Goldstein
Delia Mwinyelle
Leena Mithal
J. Z. Wang
126
0
0
02 Jun 2025
Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation
IEEE International Conference on Robotics and Automation (ICRA), 2025
Siddharth Ancha
Sunshine Jiang
Travis Manderson
Laura Brandt
Yilun Du
Philip R. Osteen
Nicholas Roy
396
2
0
28 May 2025
Locality-Aware Zero-Shot Human-Object Interaction Detection
Computer Vision and Pattern Recognition (CVPR), 2025
Sanghyun Kim
Deunsol Jung
Minsu Cho
VLM
309
3
0
26 May 2025
Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
372
1
0
12 May 2025
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Wenwen Qiang
Jianqi Zhang
Jingyao Wang
Changwen Zheng
VLM
325
0
0
10 May 2025
CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey
Jindong Li
Yongqian Li
Yali Fu
Jiahong Liu
Yixin Liu
Menglin Yang
Irwin King
VLM
284
2
0
19 Apr 2025
DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
Qinghongbing Xie
Zijian Liang
Fuhao Li
Long Zeng
247
0
0
11 Apr 2025
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
Computer Vision and Pattern Recognition (CVPR), 2025
Can Zhang
G. Lee
202
4
0
09 Apr 2025
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Xiao Zhang
Xiangyu Han
Xiwen Lai
Yao Sun
Pei Zhang
Konrad Kording
233
0
0
08 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
International Conference on Learning Representations (ICLR), 2025
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
243
5
0
03 Apr 2025
GOAL: Global-local Object Alignment Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Hyungyu Choi
Young Kyun Jang
Chanho Eom
VLM
849
6
0
22 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Computer Vision and Pattern Recognition (CVPR), 2025
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
261
5
0
21 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
179
0
0
10 Mar 2025
Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations
Meng Wang
Fan Wu
Yunchuan Qin
Ruihui Li
Zhuo Tang
KenLi Li
3DPC
299
1
0
08 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
346
32
0
05 Mar 2025
ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting
Dexter Ong
Yuezhan Tao
Varun Murali
Igor Spasojevic
Vijay Kumar
Pratik Chaudhari
3DGS
324
8
0
27 Feb 2025
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yanpeng Zhao
Yiwei Hao
Siyu Gao
Yunbo Wang
Xiaokang Yang
OCL
390
4
0
17 Feb 2025
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen
Qi Yang
Kun Ding
Tianying Wang
Gang Shen
Fei Li
Qiyuan Cao
Shiming Xiang
VLM
175
2
0
29 Jan 2025
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
International Conference on Digital Health (ICDH), 2023
Jingyuan Chen
Xingtai Lv
Mie Anderson
Natalie Hauglund
Celia Kjaerby
Verena Untiet
Maiken Nedergaard
Jiebo Luo
352
3
0
28 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
IEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
662
63
0
17 Jan 2025
GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs
Xingrui Wang
Cuiling Lan
Hanxin Zhu
Zhibo Chen
Yan Lu
3DGS
400
9
0
22 Dec 2024
Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation
International Conference on Artificial Neural Networks (ICANN), 2024
J. Zhang
Li Zhang
Shijian Li
VLM
328
0
0
18 Dec 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Computer Vision and Pattern Recognition (CVPR), 2024
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
Wen Liu
Xinyu Wang
3DGS
ViT
417
27
0
17 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
Computer Vision and Pattern Recognition (CVPR), 2024
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
284
20
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Computer Vision and Pattern Recognition (CVPR), 2024
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
633
7
0
02 Dec 2024
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2024
Weinan Zhang
Lu Zhang
Ping Hu
Liqian Ma
Yunzhi Zhuge
Huchuan Lu
3DGS
297
4
0
29 Nov 2024
Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization
International Journal of Computer Vision (IJCV), 2024
Xiao Guo
Xiaohong Liu
I. Masi
Xiaoming Liu
326
24
0
31 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
International Conference on Learning Representations (ICLR), 2024
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
388
16
0
21 Oct 2024
1
2
3
Next