ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.12750
  4. Cited By
SLIP: Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training

23 December 2021
Norman Mu
Alexander Kirillov
David A. Wagner
Saining Xie
    VLM
    CLIP
ArXivPDFHTML

Papers citing "SLIP: Self-supervision meets Language-Image Pre-training"

50 / 337 papers shown
Title
HYDEN: Hyperbolic Density Representations for Medical Images and Reports
HYDEN: Hyperbolic Density Representations for Medical Images and Reports
Zhi Qiao
Linbin Han
Xiantong Zhen
Jia-Hong Gao
Zhen Qian
31
0
0
19 Aug 2024
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance
  Discrimination
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Kaicheng Yang
Tiancheng Gu
Xiang An
Haiqiang Jiang
Xiangzi Dai
Ziyong Feng
Weidong Cai
Jiankang Deng
VLM
39
7
0
18 Aug 2024
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of
  Vision-Language Models
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
Eman Ali
Sathira Silva
Muhammad Haris Khan
VLM
29
0
0
16 Aug 2024
ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang
Guoliang Kang
VLM
SSL
42
7
0
13 Aug 2024
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive
  Language-Image Pre-traning Model
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen
Xiaozhen Qiao
Zhe Sun
Xuelong Li
VLM
37
3
0
08 Aug 2024
FMiFood: Multi-modal Contrastive Learning for Food Image Classification
FMiFood: Multi-modal Contrastive Learning for Food Image Classification
Xinyue Pan
Jiangpeng He
F. Zhu
24
2
0
07 Aug 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
32
0
0
28 Jul 2024
Unified Lexical Representation for Interpretable Visual-Language
  Alignment
Unified Lexical Representation for Interpretable Visual-Language Alignment
Yifan Li
Yikai Wang
Yanwei Fu
Dongyu Ru
Zheng-Wei Zhang
Tong He
VLM
27
3
0
25 Jul 2024
When Text and Images Don't Mix: Bias-Correcting Language-Image
  Similarity Scores for Anomaly Detection
When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
Adam Goodge
Bryan Hooi
Wee Siong Ng
26
0
0
24 Jul 2024
Assessing Brittleness of Image-Text Retrieval Benchmarks from
  Vision-Language Models Perspective
Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
38
0
0
21 Jul 2024
Distilling Vision-Language Foundation Models: A Data-Free Approach via
  Prompt Diversification
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
Yunyi Xuan
Weijie Chen
Shicai Yang
Di Xie
Luojun Lin
Yueting Zhuang
VLM
20
4
0
21 Jul 2024
Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation
Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation
Jinda Lu
Shuo Wang
Yanbin Hao
Haifeng Liu
Xiang Wang
Meng Wang
28
2
0
19 Jul 2024
MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture
  Synthesis
MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis
Ziming Zhong
Yanxu Xu
Jing Li
Jiale Xu
Zhengxin Li
Chaohui Yu
Shenghua Gao
3DV
22
3
0
18 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
22
5
0
15 Jul 2024
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training
  with Limited Resources
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei
Fanjiang Ye
Ori Yonay
Xingyu Chen
Baixi Sun
Dingwen Tao
Tianbao Yang
VLM
CLIP
46
2
0
01 Jul 2024
Semantic Compositions Enhance Vision-Language Contrastive Learning
Semantic Compositions Enhance Vision-Language Contrastive Learning
Maxwell Mbabilla Aladago
Lorenzo Torresani
Soroush Vosoughi
CoGe
VLM
CLIP
36
0
0
01 Jul 2024
Learning Robust 3D Representation from CLIP via Dual Denoising
Learning Robust 3D Representation from CLIP via Dual Denoising
Shuqing Luo
Bowen Qu
Wei-Nan Gao
39
1
0
01 Jul 2024
Fairness and Bias in Multimodal AI: A Survey
Fairness and Bias in Multimodal AI: A Survey
Tosin P. Adewumi
Lama Alkhaled
Namrata Gurung
G. V. Boven
Irene Pagliai
48
9
0
27 Jun 2024
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal
  Alignment in CLIP
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP
Sedigheh Eslami
Gerard de Melo
VLM
33
3
0
25 Jun 2024
Revealing Vision-Language Integration in the Brain with Multimodal
  Networks
Revealing Vision-Language Integration in the Brain with Multimodal Networks
Vighnesh Subramaniam
C. Conwell
Christopher Wang
Gabriel Kreiman
Boris Katz
Ignacio Cases
Andrei Barbu
30
8
0
20 Jun 2024
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate
  Associative Bias
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
Salma Abdel Magid
Jui-Hsien Wang
Kushal Kafle
Hanspeter Pfister
34
1
0
17 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
36
3
0
17 Jun 2024
Vision Language Modeling of Content, Distortion and Appearance for Image
  Quality Assessment
Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment
Fei Zhou
Zhicong Huang
Tianhao Gu
Guoping Qiu
CoGe
VLM
51
1
0
14 Jun 2024
Industrial Language-Image Dataset (ILID): Adapting Vision Foundation
  Models for Industrial Settings
Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings
Keno Moenck
Duc Trung Thieu
Julian Koch
Thorsten Schuppstuhl
VLM
27
0
0
14 Jun 2024
Exploring the Spectrum of Visio-Linguistic Compositionality and
  Recognition
Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition
Youngtaek Oh
Pyunghwan Ahn
Jinhyung Kim
Gwangmo Song
Soonyoung Lee
In So Kweon
Junmo Kim
CoGe
26
2
0
13 Jun 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Irene Huang
Wei Lin
M. Jehanzeb Mirza
Jacob A. Hansen
Sivan Doveh
...
Trevor Darrel
Chuang Gan
Aude Oliva
Rogerio Feris
Leonid Karlinsky
CoGe
LRM
30
7
0
12 Jun 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
RWKV-CLIP: A Robust Vision-Language Representation Learner
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
VLM
CLIP
32
13
0
11 Jun 2024
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
S. Linok
T. Zemskova
Svetlana Ladanova
Roman Titkov
Dmitry A. Yudin
Maxim Monastyrny
Aleksei Valenkov
LM&Ro
43
0
0
11 Jun 2024
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data
  With Soft Alignment
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment
Zijia Song
Z. Zang
Yelin Wang
Guozheng Yang
Jiangbin Zheng
Kaicheng Yu
Wanyu Chen
Stan Z. Li
31
0
0
09 Jun 2024
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose
  Audio-Language Representation
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Masahiro Yasuda
Shunsuke Tsubaki
Keisuke Imoto
VLM
36
5
0
04 Jun 2024
MLIP: Efficient Multi-Perspective Language-Image Pretraining with
  Exhaustive Data Utilization
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Yu Zhang
Qi Zhang
Zixuan Gong
Yiwei Shi
Yepeng Liu
...
Ke Liu
Kun Yi
Wei Fan
Liang Hu
Changwei Wang
CLIP
VLM
49
3
0
03 Jun 2024
ED-SAM: An Efficient Diffusion Sampling Approach to Domain
  Generalization in Vision-Language Foundation Models
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Thanh-Dat Truong
Xin Li
Bhiksha Raj
Jackson Cothren
Khoa Luu
DiffM
VLM
38
1
0
03 Jun 2024
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for
  Transferable Insights
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
30
3
0
31 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
21
0
0
27 May 2024
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD
  Generalization and Open-Set OOD Detection
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Lin Zhu
Yifeng Yang
Qinying Gu
Xinbing Wang
Cheng Zhou
Nanyang Ye
VLM
22
2
0
26 May 2024
BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection
BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection
Yuwei Niu
Shuo He
Qi Wei
Feng Liu
Lei Feng
AAML
33
1
0
24 May 2024
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Abdelrahman Abdelhamed
Mahmoud Afifi
Alec Go
MLLM
VLM
29
3
0
24 May 2024
Distilling Vision-Language Pretraining for Efficient Cross-Modal
  Retrieval
Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval
Young Kyun Jang
Donghyun Kim
Ser-nam Lim
VLM
19
0
0
23 May 2024
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Mohammed Baharoon
Jonathan Klein
D. L. Michels
SSL
VLM
34
0
0
23 May 2024
FFF: Fixing Flawed Foundations in contrastive pre-training results in
  very strong Vision-Language models
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
VLM
35
4
0
16 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks
  via Multi-modal Large Language Models
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
29
12
0
16 May 2024
Efficient Vision-Language Pre-training by Cluster Masking
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei
Zixuan Pan
Andrew Owens
VLM
26
8
0
14 May 2024
Understanding Retrieval-Augmented Task Adaptation for Vision-Language
  Models
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
Yifei Ming
Yixuan Li
VLM
23
7
0
02 May 2024
PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot
  Multi-View 3D Shape Recognition
PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition
Dongyun Lin
Yi Cheng
Shangbo Mao
Aiyuan Guo
Yiqun Li
29
2
0
30 Apr 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIP
VLM
59
19
0
30 Apr 2024
OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images
OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images
Ye Mao
Junpeng Jing
K. Mikolajczyk
VLM
32
0
0
25 Apr 2024
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
Ayumu Saito
Prachi Kudeshia
Jiju Poovvancheri
3DPC
33
7
0
25 Apr 2024
MoDE: CLIP Data Experts via Clustering
MoDE: CLIP Data Experts via Clustering
Jiawei Ma
Po-Yao Huang
Saining Xie
Shang-Wen Li
Luke Zettlemoyer
Shih-Fu Chang
Wen-tau Yih
Hu Xu
MoE
CLIP
VLM
23
10
0
24 Apr 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
  Pre-training on Web-scale Image-Text Data
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLM
CLIP
29
6
0
24 Apr 2024
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection
  and Correction
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua
Jing Shi
Kushal Kafle
Simon Jenni
Daoan Zhang
John Collomosse
Scott D. Cohen
Jiebo Luo
CoGe
VLM
42
9
0
23 Apr 2024
Previous
1234567
Next