Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.12750
Cited By
SLIP: Self-supervision meets Language-Image Pre-training
23 December 2021
Norman Mu
Alexander Kirillov
David A. Wagner
Saining Xie
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SLIP: Self-supervision meets Language-Image Pre-training"
50 / 337 papers shown
Title
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
Enhao Zhang
Chaohua Li
Chuanxing Geng
Songcan Chen
52
0
0
08 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
36
0
0
04 May 2025
Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders
Xuwei Yang
Fatemeh Tavakoli
D. B. Emerson
Anastasis Kratsios
FedML
62
0
0
30 Apr 2025
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
Valerie Zermatten
J. Castillo-Navarro
Pallavi Jain
D. Tuia
Diego Marcos
57
0
0
28 Apr 2025
Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu
Kaicheng Yang
J. Z. Wang
Haoran Xu
Ziyong Feng
Y. Wang
VLM
89
0
0
23 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
42
0
0
10 Apr 2025
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Mateusz Pach
Shyamgopal Karthik
Quentin Bouniot
Serge Belongie
Zeynep Akata
VLM
62
0
0
03 Apr 2025
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Weizhi Chen
Jingbo Chen
Yupeng Deng
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
51
0
0
25 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
48
1
0
21 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
95
3
0
19 Mar 2025
A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
Asifullah Khan
Laiba Asmatullah
Anza Malik
Shahzaib Khan
Hamna Asif
SSL
VLM
74
0
0
14 Mar 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu
Gaopeng Gou
Jiamin Zhuang
Jing Yu
Kun Song
Qihao Wang
Yili Li
Gang Xiong
VLM
75
0
0
13 Mar 2025
Is CLIP ideal? No. Can we fix it? Yes!
Raphi Kang
Yue Song
Georgia Gkioxari
Pietro Perona
VLM
53
0
0
10 Mar 2025
DiffCLIP: Differential Attention Meets CLIP
Hasan Hammoud
Bernard Ghanem
VLM
42
0
0
09 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu
Wentong Li
Song Wang
J. Chen
Jianke Zhu
3DV
LRM
73
3
0
01 Mar 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Haoyuan Li
Yanpeng Zhou
Tao Tang
Jifei Song
Yihan Zeng
Michael C. Kampffmeyer
Hang Xu
Xiaodan Liang
3DGS
57
1
0
25 Feb 2025
Object-centric Binding in Contrastive Language-Image Pretraining
Rim Assouel
Pietro Astolfi
Florian Bordes
M. Drozdzal
Adriana Romero Soriano
OCL
VLM
CoGe
103
0
0
19 Feb 2025
HCMRM: A High-Consistency Multimodal Relevance Model for Search Ads
Guobing Gan
Kaiming Gao
Li Wang
Shen Jiang
Peng Jiang
64
0
0
09 Feb 2025
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Andrew D. Bagdanov
CLIP
VLM
99
2
0
06 Feb 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
97
17
0
17 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
86
11
0
06 Jan 2025
How Panel Layouts Define Manga: Insights from Visual Ablation Experiments
Siyuan Feng
Teruya Yoshinaga
Katsuhiko Hayashi
Koki Washio
Hidetaka Kamigaito
28
0
0
26 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
Y. Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
69
1
0
10 Dec 2024
Attention Head Purification: A New Perspective to Harness CLIP for Domain Generalization
Yingfan Wang
Guoliang Kang
VLM
74
0
0
10 Dec 2024
DiffCLIP: Few-shot Language-driven Multimodal Classifier
Jiaqing Zhang
Mingxiang Cao
Xue Yang
Kai Jiang
Yunsong Li
VLM
66
0
0
10 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
70
0
0
02 Dec 2024
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLM
CLIP
85
11
0
23 Nov 2024
Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training
Ameera Bawazir
Kebin Wu
Wenbin Li
CLIP
67
1
0
20 Nov 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
H. Haresamudram
Chi Ian Tang
Sungho Suh
P. Lukowicz
Thomas Ploetz
74
2
0
11 Nov 2024
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang
Y. Yang
VGen
45
3
0
05 Nov 2024
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
A. Saporta
A. Puli
Mark Goldstein
Rajesh Ranganath
SSL
23
0
0
01 Nov 2024
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
CLIP
VLM
30
1
0
30 Oct 2024
Active Learning for Vision-Language Models
Bardia Safaei
Vishal M. Patel
VLM
34
2
0
29 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
30
3
0
21 Oct 2024
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
Qingqing Cao
Mahyar Najibi
Sachin Mehta
CLIP
DiffM
25
1
0
15 Oct 2024
A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem
Kun Ding
Ying Wang
Gaofeng Meng
Shiming Xiang
VLM
29
0
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
30
0
0
01 Oct 2024
Image Copy Detection for Diffusion Models
Wenhao Wang
Yifan Sun
Zhentao Tan
Yi Yang
28
1
0
30 Sep 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Raja Kumar
Raghav Singhal
Pranamya Kulkarni
Deval Mehta
Kshitij Jadhav
15
0
0
26 Sep 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
43
1
0
26 Sep 2024
Adversarial Backdoor Defense in CLIP
Junhao Kuang
Siyuan Liang
Jiawei Liang
Kuanrong Liu
Xiaochun Cao
AAML
34
2
0
24 Sep 2024
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
22
3
0
11 Sep 2024
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
Amin Karimi Monsefi
Kishore Prakash Sailaja
Ali Alilooee
Ser-Nam Lim
R. Ramnath
VLM
33
6
0
10 Sep 2024
Towards Generalizable Scene Change Detection
Jaewoo Kim
Uehwan Kim
38
0
0
10 Sep 2024
How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?
Sicheng Wang
Che Liu
Rossella Arcucci
VLM
MedIm
32
0
0
31 Aug 2024
HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling
Yubin Wang
Xinyang Jiang
De Cheng
Wenli Sun
Dongsheng Li
Cairong Zhao
VLM
40
0
0
27 Aug 2024
Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them
H. Haresamudram
Apoorva Beedu
Mashfiqui Rabbi
Sankalita Saha
Irfan Essa
Thomas Ploetz
26
4
0
21 Aug 2024
1
2
3
4
5
6
7
Next