Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.12750
Cited By
SLIP: Self-supervision meets Language-Image Pre-training
23 December 2021
Norman Mu
Alexander Kirillov
David A. Wagner
Saining Xie
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SLIP: Self-supervision meets Language-Image Pre-training"
50 / 337 papers shown
Title
C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing
Avigyan Bhattacharya
Mainak Singha
Ankit Jha
Biplab Banerjee
SSL
VLM
19
6
0
27 Nov 2023
Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective
Sahil Verma
Gantavya Bhatt
Avi Schwarzschild
Soumye Singhal
Arnav M. Das
Chirag Shah
John P Dickerson
Jeff Bilmes
J. Bilmes
AAML
59
1
0
25 Nov 2023
Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models
Yichao Cao
Qingfei Tang
Xiu Su
Chen Song
Shan You
Xiaobo Lu
Chang Xu
20
21
0
07 Nov 2023
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Bobby Azad
Reza Azad
Sania Eskandari
Afshin Bozorgpour
A. Kazerouni
I. Rekik
Dorit Merhof
VLM
MedIm
93
59
0
28 Oct 2023
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
G. O. D. Santos
Diego A. B. Moreira
Alef Iury Ferreira
Jhessica Silva
Luiz Pereira
...
H. Maia
Nádia Da Silva
Esther Colombini
Hélio Pedrini
Sandra Avila
VLM
CLIP
27
4
0
20 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
17
32
0
20 Oct 2023
CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training
Kihyun You
Jawook Gu
Jiyeon Ham
Beomhee Park
Jiho Kim
Eun K. Hong
Woonhyuk Baek
Byungseok Roh
CLIP
VLM
8
57
0
20 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
41
1
0
14 Oct 2023
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
19
5
0
13 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
29
28
0
11 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
22
26
0
11 Oct 2023
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
Yihao Xue
Siddharth Joshi
Dang Nguyen
Baharan Mirzasoleiman
VLM
24
4
0
08 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
29
0
0
07 Oct 2023
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
Wenhan Yang
Jingdong Gao
Baharan Mirzasoleiman
VLM
24
6
0
05 Oct 2023
Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard
Avinash Madasu
Tiep Le
Gustavo Lujan Moreno
Vasudev Lal
VLM
21
4
0
04 Oct 2023
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP
Zixiang Chen
Yihe Deng
Yuanzhi Li
Quanquan Gu
VLM
23
10
0
02 Oct 2023
FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding
Ana Ezquerro
Carlos Gómez-Rodríguez
Kevin Dela Rosa
Derek Hao Hu
19
6
0
28 Sep 2023
Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
Yun Xing
Jian Kang
Aoran Xiao
Jiahao Nie
Ling Shao
Shijian Lu
VLM
25
12
0
24 Sep 2023
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu
Houwen Peng
Zhenghong Zhou
Bin Xiao
Mengchen Liu
...
Xi
Xi Chen
Xinggang Wang
Hongyang Chao
Han Hu
VLM
OODD
26
53
0
21 Sep 2023
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu
Bingchen Zhao
Chen Wei
Cihang Xie
MLLM
23
13
0
13 Sep 2023
TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification
M. Jehanzeb Mirza
Leonid Karlinsky
Wei Lin
Horst Possegger
Rogerio Feris
Horst Bischof
VLM
27
6
0
13 Sep 2023
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
Yi Wang
C. Albrecht
Nassim Ait Ali Braham
Chenying Liu
Zhitong Xiong
Xiaoxiang Zhu
SSL
19
16
0
11 Sep 2023
Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels
Bo Wan
Tinne Tuytelaars
VLM
19
3
0
10 Sep 2023
A General-Purpose Self-Supervised Model for Computational Pathology
Richard J. Chen
Tong Ding
Ming Y. Lu
Drew F. K. Williamson
Guillaume Jaume
...
Judy J. Wang
Walt Williams
L. Le
Georg Gerber
Faisal Mahmood
MedIm
20
42
0
29 Aug 2023
Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy
Leo Fillioux
E. Gontran
J. Cartry
J. Mathieu
Sabrina Bedja
A. Boilève
P. Cournède
F. Jaulin
Stergios Christodoulidis
Maria Vakalopoulou
9
6
0
28 Aug 2023
GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning
Mainak Singha
Ankit Jha
Biplab Banerjee
VLM
17
4
0
22 Aug 2023
An Empirical Study of CLIP for Text-based Person Search
Min Cao
Yang Bai
Ziyin Zeng
Mang Ye
Min Zhang
VLM
36
36
0
19 Aug 2023
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLM
CLIP
23
45
0
16 Aug 2023
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
Kaixin Cai
Pengzhen Ren
Yi Zhu
Hang Xu
Jian-zhuo Liu
Changlin Li
Guangrun Wang
Xiaodan Liang
VLM
22
14
0
09 Aug 2023
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation
Haowei Wang
Jiji Tang
Jiayi Ji
Xiaoshuai Sun
Rongsheng Zhang
...
Minda Zhao
Lincheng Li
zeng zhao
Tangjie Lv
R. Ji
3DV
21
13
0
06 Aug 2023
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
Xuefeng Hu
Ke Zhang
Lu Xia
Albert Y. C. Chen
Jiajia Luo
...
Nan Qiao
Xiao Zeng
Min Sun
Cheng-Hao Kuo
Ram Nevatia
VLM
11
24
0
04 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
117
0
25 Jul 2023
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection
Yichao Cao
Qingfei Tang
Fengyuan Yang
Xiu Su
Shan You
Xiaobo Lu
Chang Xu
24
16
0
25 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
21
27
0
24 Jul 2023
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Pujin Cheng
Li Lin
Junyan Lyu
Yijin Huang
Wenhan Luo
Xiaoying Tang
MedIm
16
44
0
24 Jul 2023
GIST: Generating Image-Specific Text for Fine-grained Object Classification
Kathleen M. Lewis
Emily Mu
Adrian V. Dalca
John Guttag
VLM
19
7
0
21 Jul 2023
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP
S. Basu
S. Hu
Maziar Sanjabi
Daniela Massiceti
S. Feizi
VLM
11
3
0
18 Jul 2023
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Gengyuan Zhang
Yurui Zhang
Kerui Zhang
Volker Tresp
LRM
22
10
0
12 Jul 2023
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
Pratyush Maini
Sachin Goyal
Zachary Chase Lipton
J. Zico Kolter
Aditi Raghunathan
VLM
29
33
0
06 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Massimiliano Mancini
Zeynep Akata
MLLM
VLM
16
3
0
01 Jul 2023
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
Zibo Zhao
Wen Liu
Xin Chen
Xi Zeng
Rui Wang
Pei Cheng
Bin-Bin Fu
Tao Chen
Gang Yu
Shenghua Gao
DiffM
20
87
0
29 Jun 2023
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Bernard Ghanem
Dacheng Tao
ObjD
VLM
27
134
0
28 Jun 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
11
38
0
21 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
27
52
0
20 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
66
188
0
19 Jun 2023
Multi-task 3D building understanding with multi-modal pretraining
Shicheng Xu
3DPC
14
2
0
16 Jun 2023
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu
Maziar Sanjabi
Y. Ma
Kamalika Chaudhuri
Chuan Guo
11
12
0
15 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
18
7
0
13 Jun 2023
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
Ming Y. Lu
Bowen Chen
Andrew Zhang
Drew F. K. Williamson
Richard J. Chen
Tong Ding
L. Le
Yung-Sung Chuang
Faisal Mahmood
VLM
MedIm
22
96
0
13 Jun 2023
Scalable 3D Captioning with Pretrained Models
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
24
152
0
12 Jun 2023
Previous
1
2
3
4
5
6
7
Next