Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.12750
Cited By
SLIP: Self-supervision meets Language-Image Pre-training
23 December 2021
Norman Mu
Alexander Kirillov
David A. Wagner
Saining Xie
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SLIP: Self-supervision meets Language-Image Pre-training"
50 / 337 papers shown
Title
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
38
53
0
17 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
19
5
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
32
11
0
17 Jan 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
15
97
0
16 Jan 2023
CiT: Curation in Training for Effective Vision-Language Data
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
DiffM
17
24
0
05 Jan 2023
FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion
Martin Pernuš
Clinton Fookes
Vitomir Štruc
Simon Dobrišek
DiffM
14
27
0
05 Jan 2023
Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
Robert Wolfe
Yiwei Yang
Billy Howe
Aylin Caliskan
DiffM
13
51
0
21 Dec 2022
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
32
26
0
16 Dec 2022
Retrieval-based Disentangled Representation Learning with Natural Language Supervision
Jiawei Zhou
Xiaoguang Li
Lifeng Shang
Xin Jiang
Qun Liu
L. Chen
DRL
19
6
0
15 Dec 2022
NLIP: Noise-robust Language-Image Pre-training
Runhu Huang
Yanxin Long
Jianhua Han
Hang Xu
Xiwen Liang
Chunjing Xu
Xiaodan Liang
VLM
26
30
0
14 Dec 2022
Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders
Jongseong Jang
Daeun Kyung
Seunghyeon Kim
Honglak Lee
Kyunghoon Bae
E. Choi
LM&MA
MedIm
25
10
0
14 Dec 2022
TIER: Text-Image Entropy Regularization for CLIP-style models
Anil Palepu
Andrew L. Beam
MedIm
16
6
0
13 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
25
1
0
08 Dec 2022
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
19
317
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
19
11
0
29 Nov 2022
Context-Aware Robust Fine-Tuning
Xiaofeng Mao
YueFeng Chen
Xiaojun Jia
Rong Zhang
Hui Xue
Zhao Li
VLM
CLIP
20
23
0
29 Nov 2022
ComCLIP: Training-Free Compositional Image and Text Matching
Kenan Jiang
Xuehai He
Ruize Xu
X. Wang
VLM
CLIP
CoGe
6
20
0
25 Nov 2022
Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
25
19
0
21 Nov 2022
Task Residual for Tuning Vision-Language Models
Tao Yu
Zhihe Lu
Xin Jin
Zhibo Chen
Xinchao Wang
VLM
CLIP
16
81
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
31
41
0
17 Nov 2022
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual representations
Chanda Grover
Indra Deep Mastan
Debayan Gupta
VLM
CLIP
6
4
0
14 Nov 2022
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
Dennis Fedorishin
D. Mohan
Bhavin Jawade
S. Setlur
V. Govindaraju
VGen
19
10
0
06 Nov 2022
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
21
7
0
19 Oct 2022
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
52
35
0
19 Oct 2022
Non-Contrastive Learning Meets Language-Image Pre-Training
Jinghao Zhou
Li Dong
Zhe Gan
Lijuan Wang
Furu Wei
VLM
CLIP
17
25
0
17 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
24
21
0
09 Oct 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
VLM
22
19
0
30 Sep 2022
Data Poisoning Attacks Against Multimodal Encoders
Ziqing Yang
Xinlei He
Zheng Li
Michael Backes
Mathias Humbert
Pascal Berrang
Yang Zhang
AAML
106
44
0
30 Sep 2022
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Han-Hung Lee
Angel X. Chang
14
63
0
30 Sep 2022
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee
Jongsuk Kim
Hyounguk Shon
Bumsoo Kim
Seung Wook Kim
Honglak Lee
Junmo Kim
CLIP
VLM
50
53
0
27 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks
Abhishek Aich
Calvin-Khang Ta
Akash Gupta
Chengyu Song
S. Krishnamurthy
M. Salman Asif
A. Roy-Chowdhury
AAML
36
17
0
20 Sep 2022
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Yiren Jian
Chongyang Gao
Soroush Vosoughi
SSL
18
15
0
20 Sep 2022
Design of the topology for contrastive visual-textual alignment
Zhun Sun
25
1
0
05 Sep 2022
Injecting Image Details into CLIP's Feature Space
Zilun Zhang
Cuifeng Shen
Yuan-Chung Shen
Huixin Xiong
Xinyu Zhou
VLM
CLIP
14
0
0
31 Aug 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
19
27
0
29 Aug 2022
Prompt Tuning with Soft Context Sharing for Vision-Language Models
Kun Ding
Ying Wang
Pengzhang Liu
Qiang Yu
Hao Zhang
Shiming Xiang
Chunhong Pan
VPVLM
VLM
19
14
0
29 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
32
157
0
25 Aug 2022
Contrastive Audio-Language Learning for Music
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
25
44
0
25 Aug 2022
Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on Aligned Visual-Textual Features
Shichao Xu
Yikang Li
Jenhao Hsiao
C. Ho
Zhuang Qi
8
6
0
19 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
26
67
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
38
97
0
10 Aug 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
47
70
0
30 Jul 2022
V
2
^2
2
L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval
Wenhao Wang
Yifan Sun
Zongxin Yang
Yi Yang
VLM
16
3
0
26 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
19
47
0
26 Jul 2022
Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning
Shibani Santurkar
Yann Dubois
Rohan Taori
Percy Liang
Tatsunori Hashimoto
CLIP
VLM
11
41
0
15 Jul 2022
Contrastive Adapters for Foundation Model Group Robustness
Michael Zhang
Christopher Ré
VLM
8
61
0
14 Jul 2022
IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training
Xinyu Huang
Youcai Zhang
Ying Cheng
Weiwei Tian
Ruiwei Zhao
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
X. Zhang
VLM
18
14
0
12 Jul 2022
American == White in Multimodal Language-and-Image AI
Robert Wolfe
Aylin Caliskan
VLM
19
46
0
01 Jul 2022
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
12
2
0
01 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
14
16
0
01 Jul 2022
Previous
1
2
3
4
5
6
7
Next