Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.12750
Cited By
SLIP: Self-supervision meets Language-Image Pre-training
23 December 2021
Norman Mu
Alexander Kirillov
David A. Wagner
Saining Xie
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SLIP: Self-supervision meets Language-Image Pre-training"
50 / 337 papers shown
Title
PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition
Xi Fang
Weigang Wang
Xiaoxin Lv
Jun Yan
EGVM
34
3
0
20 Apr 2024
Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels
Amaya Dharmasiri
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
VLM
3DPC
34
1
0
15 Apr 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
SSL
VLM
43
7
0
15 Apr 2024
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
Yuwei Tang
Zhenyi Lin
Qilong Wang
Pengfei Zhu
Qinghua Hu
26
11
0
13 Apr 2024
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani
Amit Raj
Kevis-Kokitsi Maninis
Abhishek Kar
Yuanzhen Li
Michael Rubinstein
Deqing Sun
Leonidas J. Guibas
Justin Johnson
Varun Jampani
28
79
0
12 Apr 2024
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Zichao Li
Cihang Xie
E. D. Cubuk
CLIP
32
8
0
12 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao
Ameya Prabhu
Adhiraj Ghosh
Yash Sharma
Philip H. S. Torr
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
118
44
0
04 Apr 2024
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen
Yusuke Hirota
Mayu Otani
Noa Garcia
Yuta Nakashima
27
12
0
04 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
28
24
0
02 Apr 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
24
14
0
29 Mar 2024
Toward Interactive Regional Understanding in Vision-Large Language Models
Jungbeom Lee
Sanghyuk Chun
Sangdoo Yun
VLM
16
1
0
27 Mar 2024
DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng
Yifei Zhang
Wei Wu
Fan Lu
Shuailei Ma
Xin Jin
Wei Chen
Yujun Shen
VLM
CLIP
32
24
0
25 Mar 2024
Centered Masking for Language-Image Pre-Training
Mingliang Liang
Martha Larson
VLM
CLIP
23
4
0
23 Mar 2024
ConGeo: Robust Cross-view Geo-localization across Ground View Variations
Li Mi
Chang Xu
J. Castillo-Navarro
Syrielle Montariol
Wen Yang
Antoine Bosselut
D. Tuia
ObjD
13
4
0
20 Mar 2024
Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity
Siddharth Joshi
Arnav Jain
Ali Payani
Baharan Mirzasoleiman
VLM
CLIP
31
8
0
18 Mar 2024
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar
Pekka Marttinen
MedIm
VLM
29
9
0
15 Mar 2024
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin
Haoli Bai
Zhili Liu
Lu Hou
Muyi Sun
Linqi Song
Ying Wei
Zhenan Sun
CLIP
VLM
50
14
0
12 Mar 2024
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li
Yuke Li
Zhekai Du
Fengling Li
Ke Lu
Jingjing Li
VLM
39
4
0
11 Mar 2024
Differentially Private Representation Learning via Image Captioning
Tom Sander
Yaodong Yu
Maziar Sanjabi
Alain Durmus
Yi-An Ma
Kamalika Chaudhuri
Chuan Guo
48
3
0
04 Mar 2024
Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning
Maurits J. R. Bleeker
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
VLM
38
3
0
27 Feb 2024
Parameter-efficient Prompt Learning for 3D Point Cloud Understanding
Hongyu Sun
Yongcai Wang
Wang Chen
Haoran Deng
Deying Li
VPVLM
44
5
0
24 Feb 2024
Analysis of Using Sigmoid Loss for Contrastive Learning
Chungpa Lee
Joonhwan Chang
Jy-yong Sohn
35
2
0
20 Feb 2024
GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data
Haoyuan Li
Yanpeng Zhou
Yihan Zeng
Hang Xu
Xiaodan Liang
3DGS
CLIP
19
0
0
09 Feb 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Hasan Hammoud
Hani Itani
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
CLIP
VLM
112
35
0
02 Feb 2024
On the Efficacy of Text-Based Input Modalities for Action Anticipation
Apoorva Beedu
Karan Samel
Irfan Essa
45
2
0
23 Jan 2024
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
VLM
14
0
0
22 Jan 2024
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Xiangshuo Qiao
Xianxin Li
Xiaozhe Qu
Jie M. Zhang
Yang Liu
Yu Luo
Cihang Jin
Jin Ma
VLM
23
0
0
19 Jan 2024
Exploring scalable medical image encoders beyond text supervision
Fernando Pérez-García
Harshita Sharma
Sam Bond-Taylor
Kenza Bouzid
Valentina Salvatelli
...
Maria T. A. Wetscherek
Noel Codella
Stephanie L. Hyland
Javier Alvarez-Valle
Ozan Oktay
LM&MA
MedIm
48
9
0
19 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
26
3
0
15 Jan 2024
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
33
7
0
12 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
27
283
0
11 Jan 2024
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He
Longhui Wei
Lingxi Xie
Qi Tian
43
8
0
06 Jan 2024
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma
Furong Xu
Jian Liu
Ming Yang
Qingpei Guo
VLM
34
3
0
04 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
20
22
0
03 Jan 2024
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim
Byeongho Heo
Dongyoon Han
44
3
0
30 Dec 2023
Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution
Ying Wang
Tim G. J. Rudner
Andrew Gordon Wilson
10
19
0
28 Dec 2023
ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training
Rongsheng Wang
Qingsong Yao
Haoran Lai
Zhiyang He
Xiaodong Tao
Zihang Jiang
S.Kevin Zhou
VLM
MedIm
31
6
0
20 Dec 2023
Mutual-modality Adversarial Attack with Semantic Perturbation
Jingwen Ye
Ruonan Yu
Songhua Liu
Xinchao Wang
AAML
24
9
0
20 Dec 2023
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining
Bumsoo Kim
Yeonsik Jo
Jinhyung Kim
S. Kim
VLM
8
6
0
19 Dec 2023
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders
Bumsoo Kim
Jinhyung Kim
Yeonsik Jo
S. Kim
VLM
16
3
0
19 Dec 2023
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao
Haoqin Tu
Chen Wei
Jieru Mei
Cihang Xie
6
31
0
18 Dec 2023
Domain Prompt Learning with Quaternion Networks
Qinglong Cao
Zhengqin Xu
Yuntian Chen
Chao Ma
Xiaokang Yang
VLM
27
10
0
12 Dec 2023
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
Yubin Wang
Xinyang Jiang
De Cheng
Dongsheng Li
Cairong Zhao
VLM
38
15
0
11 Dec 2023
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan
Kaifeng Chen
Dilip Krishnan
Dina Katabi
Phillip Isola
Yonglong Tian
CLIP
VLM
28
61
0
07 Dec 2023
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Ying Nie
Wei He
Kai Han
Yehui Tang
Tianyu Guo
Fanyi Du
Yunhe Wang
VLM
13
3
0
01 Dec 2023
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard
Avinash Madasu
Tiep Le
Gustavo Lujan Moreno
Anahita Bhiwandiwalla
Vasudev Lal
43
16
0
30 Nov 2023
MLLMs-Augmented Visual-Language Representation Learning
Yanqing Liu
Kai Wang
Wenqi Shao
Ping Luo
Yu Qiao
Mike Zheng Shou
Kaipeng Zhang
Yang You
VLM
21
11
0
30 Nov 2023
MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition
Dan Song
Xinwei Fu
Weizhi Nie
Wenhui Li
Lanjun Wang
You Yang
Anan Liu
VLM
27
6
0
30 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
35
0
0
28 Nov 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
24
43
0
28 Nov 2023
Previous
1
2
3
4
5
6
7
Next