ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07143
  4. Cited By
Reproducible scaling laws for contrastive language-image learning

Reproducible scaling laws for contrastive language-image learning

14 December 2022
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Reproducible scaling laws for contrastive language-image learning"

50 / 114 papers shown
Title
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via D\mathbf{\texttt{D}}Dual-H\mathbf{\texttt{H}}Head O\mathbf{\texttt{O}}Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
38
0
0
12 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
34
0
0
08 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Y. Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon
Federico Girella
Ziyue Liu
Marco Cristani
Yiming Wang
VLM
48
0
0
06 May 2025
Scaling Laws For Scalable Oversight
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
70
0
0
25 Apr 2025
CAMU: Context Augmentation for Meme Understanding
CAMU: Context Augmentation for Meme Understanding
Girish A. Koushik
Diptesh Kanojia
Helen Treharne
Aditya Joshi
VLM
91
0
0
24 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
28
0
0
07 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
64
0
0
03 Apr 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
97
2
0
27 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
106
0
0
20 Mar 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu
Gaopeng Gou
Jiamin Zhuang
Jing Yu
Kun Song
Qihao Wang
Yili Li
Gang Xiong
VLM
75
0
0
13 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
77
5
0
05 Mar 2025
Pretrained Embeddings as a Behavior Specification Mechanism
Parv Kapoor
Abigail Hammer
Ashish Kapoor
Karen Leung
Eunsuk Kang
26
0
0
03 Mar 2025
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen
Yifei Wang
Zequn Zeng
Zhong Peng
Yudi Su
Xinyang Liu
Bo Chen
Hongwei Liu
Stefanie Jegelka
Chenyu You
CLL
66
2
0
03 Mar 2025
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
Benjamin Schneider
Florian Kerschbaum
Wenhu Chen
82
0
0
01 Mar 2025
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi
Ali Nazari
Aminreza Sefid
Mohammadali Banayeeanzade
M. Rohban
M. Baghshah
VLM
73
1
0
27 Feb 2025
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
Songpei Xu
Shijia Wang
Da Guo
Xianwen Guo
Qiang Xiao
Fangjian Li
Chuanjiang Luo
76
0
0
17 Feb 2025
Phantom: Subject-consistent video generation via cross-modal alignment
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu
Tianxiang Ma
Bingchuan Li
Zhuowei Chen
Jiawei Liu
Qian He
Xinglong Wu
Qian He
Xinglong Wu
DiffM
VGen
50
5
0
16 Feb 2025
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
Ruoxuan Feng
Jiangyu Hu
Wenke Xia
Tianci Gao
Ao Shen
Yuhao Sun
Bin Fang
Di Hu
42
3
0
15 Feb 2025
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
Khanh Nguyen
Ghulam Mubashar Hassan
Ajmal Saeed Mian
3DPC
42
0
0
15 Feb 2025
SWA-LDM: Toward Stealthy Watermarks for Latent Diffusion Models
SWA-LDM: Toward Stealthy Watermarks for Latent Diffusion Models
Z. Yang
Linye Lyu
Xuanhang Chang
Daojing He
Yu Li
36
0
0
14 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begum Demir
Ioannis Papoutsis
VLM
81
0
0
13 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
112
0
0
12 Feb 2025
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Wenhao Wang
Adam Dziedzic
Grace C. Kim
Michael Backes
Franziska Boenisch
79
0
0
11 Feb 2025
Keep It Light! Simplifying Image Clustering Via Text-Free Adapters
Keep It Light! Simplifying Image Clustering Via Text-Free Adapters
Yicen Li
Haitz Sáez de Ocáriz Borde
Anastasis Kratsios
Paul D. McNicholas
VLM
88
0
0
06 Feb 2025
RandLoRA: Full-rank parameter-efficient fine-tuning of large models
RandLoRA: Full-rank parameter-efficient fine-tuning of large models
Paul Albert
Frederic Z. Zhang
Hemanth Saratchandran
Cristian Rodriguez-Opazo
Anton van den Hengel
Ehsan Abbasnejad
94
0
0
03 Feb 2025
Visual Generation Without Guidance
Huayu Chen
Kai Jiang
Kaiwen Zheng
Jianfei Chen
Hang Su
J. Zhu
55
0
0
28 Jan 2025
Rethinking the Bias of Foundation Model under Long-tailed Distribution
Rethinking the Bias of Foundation Model under Long-tailed Distribution
Jiahao Chen
Bin Qin
Jiangmeng Li
Hao Chen
Bing-Huang Su
77
0
0
27 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
45
0
0
19 Jan 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano
M. W. Sun
James Burgess
Liangyu Chen
Jeffrey Nirschl
...
Xiaohan Wang
Yuhui Zhang
Alfred Seunghoon Song
Robert Tibshirani
Serena Yeung-Levy
LM&MA
VLM
MedIm
58
6
0
13 Jan 2025
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang
Yanbo Xu
Naoto Usuyama
Hanwen Xu
J. Bagga
...
Carlo Bifulco
M. Lungren
Tristan Naumann
Sheng Wang
Hoifung Poon
LM&MA
MedIm
151
198
0
10 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
60
6
0
03 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
51
18
0
03 Jan 2025
Demystifying CLIP Data
Demystifying CLIP Data
Hu Xu
Saining Xie
Xiaoqing Ellen Tan
Po-Yao (Bernie) Huang
Russell Howes
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
CLIP
23
107
0
31 Dec 2024
Adversarial Hubness in Multi-Modal Retrieval
Adversarial Hubness in Multi-Modal Retrieval
Tingwei Zhang
Fnu Suya
Rishi Jha
Collin Zhang
Vitaly Shmatikov
AAML
81
1
0
18 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
70
0
0
02 Dec 2024
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Zhichao Zhang
Wei Sun
Xinyue Li
Yunhao Li
Qihang Ge
...
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
EGVM
117
1
0
25 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
66
0
0
18 Nov 2024
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
Huayang Huang
Yu Wu
Qian Wang
DiffM
WIGM
49
4
0
06 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
45
1
0
31 Oct 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach
Yiran Huang
Stephan Alaniz
Trevor Darrell
Zeynep Akata
VLM
40
2
0
25 Oct 2024
Fast constrained sampling in pre-trained diffusion models
Fast constrained sampling in pre-trained diffusion models
Alexandros Graikos
Nebojsa Jojic
Dimitris Samaras
DiffM
25
1
0
24 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
44
3
0
24 Oct 2024
Probabilistic Language-Image Pre-Training
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
75
4
2
24 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
30
3
0
21 Oct 2024
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Seulbi Lee
J. Kim
Sangheum Hwang
LRM
75
0
0
19 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-xiong Wang
43
1
0
18 Oct 2024
An Online Learning Approach to Prompt-based Selection of Generative Models
An Online Learning Approach to Prompt-based Selection of Generative Models
Xiaoyan Hu
Ho-fung Leung
Farzan Farnia
33
2
0
17 Oct 2024
123
Next