ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.12750
  4. Cited By
SLIP: Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training

23 December 2021
Norman Mu
Alexander Kirillov
David A. Wagner
Saining Xie
    VLM
    CLIP
ArXivPDFHTML

Papers citing "SLIP: Self-supervision meets Language-Image Pre-training"

50 / 337 papers shown
Title
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
38
53
0
17 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
19
5
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
32
11
0
17 Jan 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with
  Multimodal Models
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
15
97
0
16 Jan 2023
CiT: Curation in Training for Effective Vision-Language Data
CiT: Curation in Training for Effective Vision-Language Data
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
DiffM
17
24
0
05 Jan 2023
FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion
FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion
Martin Pernuš
Clinton Fookes
Vitomir Štruc
Simon Dobrišek
DiffM
14
27
0
05 Jan 2023
Contrastive Language-Vision AI Models Pretrained on Web-Scraped
  Multimodal Data Exhibit Sexual Objectification Bias
Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
Robert Wolfe
Yiwei Yang
Billy Howe
Aylin Caliskan
DiffM
13
51
0
21 Dec 2022
Attentive Mask CLIP
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
32
26
0
16 Dec 2022
Retrieval-based Disentangled Representation Learning with Natural
  Language Supervision
Retrieval-based Disentangled Representation Learning with Natural Language Supervision
Jiawei Zhou
Xiaoguang Li
Lifeng Shang
Xin Jiang
Qun Liu
L. Chen
DRL
19
6
0
15 Dec 2022
NLIP: Noise-robust Language-Image Pre-training
NLIP: Noise-robust Language-Image Pre-training
Runhu Huang
Yanxin Long
Jianhua Han
Hang Xu
Xiwen Liang
Chunjing Xu
Xiaodan Liang
VLM
26
30
0
14 Dec 2022
Significantly Improving Zero-Shot X-ray Pathology Classification via
  Fine-tuning Pre-trained Image-Text Encoders
Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders
Jongseong Jang
Daeun Kyung
Seunghyeon Kim
Honglak Lee
Kyunghoon Bae
E. Choi
LM&MA
MedIm
25
10
0
14 Dec 2022
TIER: Text-Image Entropy Regularization for CLIP-style models
TIER: Text-Image Entropy Regularization for CLIP-style models
Anil Palepu
Andrew L. Beam
MedIm
16
6
0
13 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
25
1
0
08 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
19
317
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph
  Riddles
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
19
11
0
29 Nov 2022
Context-Aware Robust Fine-Tuning
Context-Aware Robust Fine-Tuning
Xiaofeng Mao
YueFeng Chen
Xiaojun Jia
Rong Zhang
Hui Xue
Zhao Li
VLM
CLIP
20
23
0
29 Nov 2022
ComCLIP: Training-Free Compositional Image and Text Matching
ComCLIP: Training-Free Compositional Image and Text Matching
Kenan Jiang
Xuehai He
Ruize Xu
X. Wang
VLM
CLIP
CoGe
6
20
0
25 Nov 2022
Unifying Vision-Language Representation Space with Single-tower
  Transformer
Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
25
19
0
21 Nov 2022
Task Residual for Tuning Vision-Language Models
Task Residual for Tuning Vision-Language Models
Tao Yu
Zhihe Lu
Xin Jin
Zhibo Chen
Xinchao Wang
VLM
CLIP
16
81
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
31
41
0
17 Nov 2022
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual
  representations
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual representations
Chanda Grover
Indra Deep Mastan
Debayan Gupta
VLM
CLIP
6
4
0
14 Nov 2022
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source
  Localization
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
Dennis Fedorishin
D. Mohan
Bhavin Jawade
S. Setlur
V. Govindaraju
VGen
19
10
0
06 Nov 2022
VTC: Improving Video-Text Retrieval with User Comments
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
21
7
0
19 Oct 2022
A Unified View of Masked Image Modeling
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
52
35
0
19 Oct 2022
Non-Contrastive Learning Meets Language-Image Pre-Training
Non-Contrastive Learning Meets Language-Image Pre-Training
Jinghao Zhou
Li Dong
Zhe Gan
Lijuan Wang
Furu Wei
VLM
CLIP
17
25
0
17 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature
  Alignment
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
24
21
0
09 Oct 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text
  Pre-training
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
VLM
22
19
0
30 Sep 2022
Data Poisoning Attacks Against Multimodal Encoders
Data Poisoning Attacks Against Multimodal Encoders
Ziqing Yang
Xinlei He
Zheng Li
Michael Backes
Mathias Humbert
Pascal Berrang
Yang Zhang
AAML
106
44
0
30 Sep 2022
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Han-Hung Lee
Angel X. Chang
14
63
0
30 Sep 2022
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee
Jongsuk Kim
Hyounguk Shon
Bumsoo Kim
Seung Wook Kim
Honglak Lee
Junmo Kim
CLIP
VLM
50
53
0
27 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks
GAMA: Generative Adversarial Multi-Object Scene Attacks
Abhishek Aich
Calvin-Khang Ta
Akash Gupta
Chengyu Song
S. Krishnamurthy
M. Salman Asif
A. Roy-Chowdhury
AAML
36
17
0
20 Sep 2022
Non-Linguistic Supervision for Contrastive Learning of Sentence
  Embeddings
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Yiren Jian
Chongyang Gao
Soroush Vosoughi
SSL
18
15
0
20 Sep 2022
Design of the topology for contrastive visual-textual alignment
Design of the topology for contrastive visual-textual alignment
Zhun Sun
25
1
0
05 Sep 2022
Injecting Image Details into CLIP's Feature Space
Injecting Image Details into CLIP's Feature Space
Zilun Zhang
Cuifeng Shen
Yuan-Chung Shen
Huixin Xiong
Xinyu Zhou
VLM
CLIP
14
0
0
31 Aug 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
19
27
0
29 Aug 2022
Prompt Tuning with Soft Context Sharing for Vision-Language Models
Prompt Tuning with Soft Context Sharing for Vision-Language Models
Kun Ding
Ying Wang
Pengzhang Liu
Qiang Yu
Hao Zhang
Shiming Xiang
Chunhong Pan
VPVLM
VLM
19
14
0
29 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
32
157
0
25 Aug 2022
Contrastive Audio-Language Learning for Music
Contrastive Audio-Language Learning for Music
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
25
44
0
25 Aug 2022
Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on
  Aligned Visual-Textual Features
Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on Aligned Visual-Textual Features
Shichao Xu
Yikang Li
Jenhao Hsiao
C. Ho
Zhuang Qi
8
6
0
19 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
26
67
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and
  Robustness of CLIP
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
38
97
0
10 Aug 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
47
70
0
30 Jul 2022
V$^2$L: Leveraging Vision and Vision-language Models into Large-scale
  Product Retrieval
V2^22L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval
Wenhao Wang
Yifan Sun
Zongxin Yang
Yi Yang
VLM
16
3
0
26 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
19
47
0
26 Jul 2022
Is a Caption Worth a Thousand Images? A Controlled Study for
  Representation Learning
Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning
Shibani Santurkar
Yann Dubois
Rohan Taori
Percy Liang
Tatsunori Hashimoto
CLIP
VLM
11
41
0
15 Jul 2022
Contrastive Adapters for Foundation Model Group Robustness
Contrastive Adapters for Foundation Model Group Robustness
Michael Zhang
Christopher Ré
VLM
8
61
0
14 Jul 2022
IDEA: Increasing Text Diversity via Online Multi-Label Recognition for
  Vision-Language Pre-training
IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training
Xinyu Huang
Youcai Zhang
Ying Cheng
Weiwei Tian
Ruiwei Zhao
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
X. Zhang
VLM
18
14
0
12 Jul 2022
American == White in Multimodal Language-and-Image AI
American == White in Multimodal Language-and-Image AI
Robert Wolfe
Aylin Caliskan
VLM
19
46
0
01 Jul 2022
(Un)likelihood Training for Interpretable Embedding
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
12
2
0
01 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in
  E-commerce
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
14
16
0
01 Jul 2022
Previous
1234567
Next