ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.20088
  4. Cited By
Improving CLIP Training with Language Rewrites

Improving CLIP Training with Language Rewrites

31 May 2023
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
    BDL
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Improving CLIP Training with Language Rewrites"

50 / 113 papers shown
Title
Open Vocabulary Multi-Label Video Classification
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
Son Tran
Mubarak Shah
Benjamin Z. Yao
Trishul M. Chilimbi
VLM
62
1
0
12 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
48
5
0
11 Jul 2024
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal
  Perception
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Xiaotong Li
Fan Zhang
Haiwen Diao
Yueze Wang
Xinlong Wang
Ling-yu Duan
VLM
29
25
0
11 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
58
4
0
09 Jul 2024
Semantic Compositions Enhance Vision-Language Contrastive Learning
Semantic Compositions Enhance Vision-Language Contrastive Learning
Maxwell Mbabilla Aladago
Lorenzo Torresani
Soroush Vosoughi
CoGe
VLM
CLIP
36
0
0
01 Jul 2024
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate
  Associative Bias
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
Salma Abdel Magid
Jui-Hsien Wang
Kushal Kafle
Hanspeter Pfister
34
1
0
17 Jun 2024
Exploring the Spectrum of Visio-Linguistic Compositionality and
  Recognition
Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition
Youngtaek Oh
Pyunghwan Ahn
Jinhyung Kim
Gwangmo Song
Soonyoung Lee
In So Kweon
Junmo Kim
CoGe
26
2
0
13 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
36
35
0
12 Jun 2024
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang
Wenbin He
Xiwei Xuan
Clint Sebastian
Jorge Henrique Piazentin Ono
...
Sima Behpour
T. Doan
Liang Gou
Han-Wei Shen
Liu Ren
VLM
27
5
0
07 Jun 2024
ED-SAM: An Efficient Diffusion Sampling Approach to Domain
  Generalization in Vision-Language Foundation Models
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Thanh-Dat Truong
Xin Li
Bhiksha Raj
Jackson Cothren
Khoa Luu
DiffM
VLM
40
1
0
03 Jun 2024
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for
  Transferable Insights
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
30
3
0
31 May 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a
  Game-Theoretic View
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang
Shichao Dong
Yapeng Zhu
Kelu Yao
Weidong Zhao
Chao Li
Ping Luo
CoGe
LRM
43
2
0
27 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
36
0
26 May 2024
Data Augmentation for Text-based Person Retrieval Using Large Language
  Models
Data Augmentation for Text-based Person Retrieval Using Large Language Models
Zheng Li
Lijia Si
Caili Guo
Yang Yang
Qiushi Cao
39
3
0
20 May 2024
FFF: Fixing Flawed Foundations in contrastive pre-training results in
  very strong Vision-Language models
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
VLM
35
4
0
16 May 2024
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Oncel Tuzel
VLM
CLIP
30
6
0
14 May 2024
The Devil is in the Few Shots: Iterative Visual Knowledge Completion for
  Few-shot Learning
The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Yaohui Li
Qifeng Zhou
Haoxing Chen
Jianbing Zhang
Xinyu Dai
Hao Zhou
VLM
45
0
0
15 Apr 2024
Would Deep Generative Models Amplify Bias in Future Models?
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen
Yusuke Hirota
Mayu Otani
Noa Garcia
Yuta Nakashima
27
12
0
04 Apr 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
26
19
0
30 Mar 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via
  Negations
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
24
14
0
29 Mar 2024
DreamLIP: Language-Image Pre-training with Long Captions
DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng
Yifei Zhang
Wei Wu
Fan Lu
Shuailei Ma
Xin Jin
Wei Chen
Yujun Shen
VLM
CLIP
32
25
0
25 Mar 2024
UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction
UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction
Xixuan Hao
Wei Chen
Yibo Yan
Siru Zhong
Kun Wang
Qingsong Wen
Yuxuan Liang
VLM
74
0
0
25 Mar 2024
Debiasing surgeon: fantastic weights and how to find them
Debiasing surgeon: fantastic weights and how to find them
Rémi Nahon
Ivan Luiz De Moura Matos
Van-Tam Nguyen
Enzo Tartaglione
34
1
0
21 Mar 2024
Just Say the Name: Online Continual Learning with Category Names Only
  via Data Generation
Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
Minhyuk Seo
Diganta Misra
Seongwon Cho
Minjae Lee
Jonghyun Choi
CLL
33
7
0
16 Mar 2024
Masked Generative Story Transformer with Character Guidance and Caption
  Augmentation
Masked Generative Story Transformer with Character Guidance and Caption Augmentation
Christos Papadimitriou
Giorgos Filandrianos
Maria Lymperaiou
Giorgos Stamou
DiffM
92
1
0
13 Mar 2024
A Deep Learning Method for Classification of Biophilic Artworks
A Deep Learning Method for Classification of Biophilic Artworks
Purna Kar
Jordan J. Bird
Yangang Xing
Alexander Sumich
Andrew Knight
Ahmad Lotfi
Benedict Carpenter van Barthold
15
0
0
08 Mar 2024
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
64
1
0
01 Mar 2024
Pre-training Cross-lingual Open Domain Question Answering with
  Large-scale Synthetic Supervision
Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic Supervision
Fan Jiang
Tom Drummond
Trevor Cohn
CLIP
ELM
LRM
26
3
0
26 Feb 2024
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
Hyunjae Kim
Seunghyun Yoon
Trung Bui
Handong Zhao
Quan Tran
Franck Dernoncourt
Jaewoo Kang
CLIP
19
2
0
23 Feb 2024
Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing
Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing
Yong Cao
Wenyan Li
Jiaang Li
Yifei Yuan
Antonia Karamolegkou
Daniel Hershcovich
VLM
20
7
0
08 Feb 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Hasan Hammoud
Hani Itani
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
CLIP
VLM
112
36
0
02 Feb 2024
Learning Vision from Models Rivals Learning Vision from Data
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian
Lijie Fan
Kaifeng Chen
Dina Katabi
Dilip Krishnan
Phillip Isola
22
45
0
28 Dec 2023
Medical Vision Language Pretraining: A survey
Medical Vision Language Pretraining: A survey
Prashant Shrestha
Sanskar Amgain
Bidur Khanal
Cristian A. Linte
Binod Bhattarai
VLM
27
14
0
11 Dec 2023
Leveraging Generative Language Models for Weakly Supervised Sentence
  Component Analysis in Video-Language Joint Learning
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
17
1
0
10 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
16
6
0
06 Dec 2023
Mitigating Fine-Grained Hallucination by Fine-Tuning Large
  Vision-Language Models with Caption Rewrites
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites
Lei Wang
Jiabang He
Shenshen Li
Ning Liu
Ee-Peng Lim
MLLM
18
38
0
04 Dec 2023
SocialCounterfactuals: Probing and Mitigating Intersectional Social
  Biases in Vision-Language Models with Counterfactual Examples
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard
Avinash Madasu
Tiep Le
Gustavo Lujan Moreno
Anahita Bhiwandiwalla
Vasudev Lal
45
16
0
30 Nov 2023
MLLMs-Augmented Visual-Language Representation Learning
MLLMs-Augmented Visual-Language Representation Learning
Yanqing Liu
Kai Wang
Wenqi Shao
Ping Luo
Yu Qiao
Mike Zheng Shou
Kaipeng Zhang
Yang You
VLM
21
11
0
30 Nov 2023
GELDA: A generative language annotation framework to reveal visual
  biases in datasets
GELDA: A generative language annotation framework to reveal visual biases in datasets
Krish Kabra
Kathleen M. Lewis
Guha Balakrishnan
VLM
11
1
0
29 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
MLLM
VLM
18
579
0
21 Nov 2023
Deep Tensor Network
Deep Tensor Network
Yifan Zhang
16
0
0
18 Nov 2023
Enhancing Multimodal Compositional Reasoning of Visual Language Models
  with Generative Negative Mining
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining
U. Sahin
Hang Li
Qadeer Ahmad Khan
Daniel Cremers
Volker Tresp
VLM
CoGe
23
12
0
07 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
21
54
0
31 Oct 2023
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP
  Performance on Low-Resource Languages
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
G. O. D. Santos
Diego A. B. Moreira
Alef Iury Ferreira
Jhessica Silva
Luiz Pereira
...
H. Maia
Nádia Da Silva
Esther Colombini
Hélio Pedrini
Sandra Avila
VLM
CLIP
29
4
0
20 Oct 2023
3D-GPT: Procedural 3D Modeling with Large Language Models
3D-GPT: Procedural 3D Modeling with Large Language Models
Chunyi Sun
Junlin Han
Weijian Deng
Xinlong Wang
Zishan Qin
Stephen Gould
37
39
0
19 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
29
28
0
11 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video
  Understanding Tasks
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
29
0
0
07 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation
  by decoupling object-attribute association
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
17
4
0
02 Oct 2023
PerceptionCLIP: Visual Classification by Inferring and Conditioning on
  Contexts
PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts
Bang An
Sicheng Zhu
Michael-Andrei Panaitescu-Liess
Chaithanya Kumar Mummadi
Furong Huang
VLM
20
7
0
02 Aug 2023
GIST: Generating Image-Specific Text for Fine-grained Object
  Classification
GIST: Generating Image-Specific Text for Fine-grained Object Classification
Kathleen M. Lewis
Emily Mu
Adrian V. Dalca
John Guttag
VLM
22
7
0
21 Jul 2023
Previous
123
Next