ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.20088
  4. Cited By
Improving CLIP Training with Language Rewrites
v1v2 (latest)

Improving CLIP Training with Language Rewrites

Neural Information Processing Systems (NeurIPS), 2023
31 May 2023
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
    BDLVLMCLIP
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)Github (280★)

Papers citing "Improving CLIP Training with Language Rewrites"

50 / 78 papers shown
Title
DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering
DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering
Toshiki Katsube
Taiga Fukuhara
Kenichiro Ando
Yusuke Mukuta
Kohei Uehara
Tatsuya Harada
VLM
76
0
0
30 Nov 2025
Scaling Self-Supervised and Cross-Modal Pretraining for Volumetric CT Transformers
Scaling Self-Supervised and Cross-Modal Pretraining for Volumetric CT Transformers
Cris Claessens
Christiaan Viviers
Giacomo DÁmicantonio
Egor Bondarev
Fons van der Sommen
MedImViT
184
0
0
21 Nov 2025
Contrastive vision-language learning with paraphrasing and negation
K. Ngan
Saman Sadeghi Afgeh
Joe Townsend
Artur Garcez
VLM
144
0
0
20 Nov 2025
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
Jian Zhang
Junyi Guo
Junyi Yuan
Huanda Lu
Yanlin Zhou
Fangyu Wu
Qiufeng Wang
Dongming Lu
68
0
0
09 Nov 2025
Caption Injection for Optimization in Generative Search Engine
Caption Injection for Optimization in Generative Search Engine
Xiaolu Chen
Yong Liao
DiffM
88
0
0
06 Nov 2025
PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning
PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning
Yicheng Xiao
Yihao Chen
H. Ma
Jiale Hong
Caorui Li
Lingxiang Wu
Haiyun Guo
Jinqiao Wang
CLIPVLM
123
0
0
06 Nov 2025
Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition
Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition
Pei Peng
MingKun Xie
Hang Hao
Tong Jin
ShengJun Huang
BDLCML
237
0
0
30 Oct 2025
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
Ziheng Zhang
Xinyue Ma
A. Chowdhury
Elizabeth G. Campolongo
Matthew J. Thompson
...
Hilmar Lapp
Tanya Berger-Wolf
Yu-Chuan Su
Wei-Lun Chao
Jianyang Gu
204
0
0
23 Oct 2025
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
Leander Girrbach
Stephan Alaniz
Genevieve Smith
Trevor Darrell
Zeynep Akata
185
1
0
04 Oct 2025
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Long Xing
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Jianze Liang
Qidong Huang
Jiaqi Wang
Feng Wu
Dahua Lin
OffRLVLM
94
4
0
26 Sep 2025
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
Xiaofu Chen
Israfel Salazar
Yova Kementchedjhieva
172
1
0
04 Sep 2025
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
Yanqing Liu
Xianhang Li
Letian Zhang
Zirui Wang
Zeyu Zheng
Yuyin Zhou
Cihang Xie
VLM
181
2
0
01 Sep 2025
MobileCLIP2: Improving Multi-Modal Reinforced Training
MobileCLIP2: Improving Multi-Modal Reinforced Training
Fartash Faghri
Pavan Kumar Anasosalu Vasu
Cem Koc
Vaishaal Shankar
Alexander Toshev
Oncel Tuzel
Hadi Pouransari
CLIPVLM
416
1
0
28 Aug 2025
Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images
Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images
Kaiyu Li
Xiangyong Cao
Ruixun Liu
Shihong Wang
Zixuan Jiang
Zhi Wang
Deyu Meng
109
2
0
25 Aug 2025
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Yuchen Zhou
Jiayu Tang
Shuo Yang
Xiaoyan Xiao
Yuqin Dai
Wenhao Yang
Chao Gou
Xiaobo Xia
Tat-Seng Chua
VLMCoGeLRM
137
1
0
15 Aug 2025
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
Zhixiang Wei
Guangting Wang
Xiaoxiao Ma
Ke Mei
Huajun Chen
Yi-jing Jin
Fengyun Rao
CLIPMLLMVLM
149
5
0
30 Jul 2025
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
SmartCLIP: Modular Vision-language Alignment with Identification GuaranteesComputer Vision and Pattern Recognition (CVPR), 2025
Shaoan Xie
Lingjing Kong
Yujia Zheng
Yu Yao
Zeyu Tang
Eric Xing
Guangyi Chen
Kun Zhang
VLM
198
3
0
29 Jul 2025
Mining Contextualized Visual Associations from Images for Creativity Understanding
Mining Contextualized Visual Associations from Images for Creativity Understanding
Ananya Sahu
Amith Ananthram
Kathleen McKeown
157
0
0
25 Jul 2025
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Si-Woo Kim
MinJu Jeon
Ye-Chan Kim
Soeun Lee
Taewhan Kim
Dong-Jin Kim
161
3
0
24 Jul 2025
Improving Large Vision-Language Models' Understanding for Field Data
Improving Large Vision-Language Models' Understanding for Field Data
Xiaomei Zhang
Hanyu Zheng
Xiangyu Zhu
Jinghuan Wei
Junhong Zou
Zhen Lei
Zhaoxiang Zhang
VLM
107
0
0
24 Jul 2025
FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
Bingchao Wang
Zhiwei Ning
Jianyu Ding
Xuanang Gao
Yin Li
Dongsheng Jiang
J. Yang
Wei Liu
CLIPVLM
202
5
0
14 Jul 2025
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalComputer Vision and Pattern Recognition (CVPR), 2025
Leqi Shen
Guoqiang Gong
Tianxiang Hao
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Jungong Han
Guiguang Ding
174
4
0
10 Jun 2025
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Kyeonghyun Kim
Jinhee Jang
Juhwan Choi
Yoonji Lee
Kyohoon Jin
Youngbin Kim
200
0
0
09 Jun 2025
Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning
Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning
Amit Peleg
Naman D. Singh
Matthias Hein
CoGeVLM
312
1
0
30 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
145
1
0
28 May 2025
CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs
CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs
Yiqing Zhang
Xiaozhong Liu
Fabricio Murai
134
1
0
24 May 2025
Cultural Awareness in Vision-Language Models: A Cross-Country Exploration
Cultural Awareness in Vision-Language Models: A Cross-Country Exploration
Avinash Madasu
Vasudev Lal
Phillip Howard
VLM
172
2
0
23 May 2025
MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological AssessmentInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Siyuan Yan
Xiaochen Li
Ming Hu
Yiwen Jiang
Zhen Yu
Zongyuan Ge
MedImVLM
232
5
0
14 May 2025
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Mainak Singha
Subhankar Roy
Sarthak Mehrotra
Ankit Jha
Moloud Abdar
Biplab Banerjee
Elisa Ricci
VLMVPVLM
517
1
0
29 Apr 2025
Decoupled Global-Local Alignment for Improving Compositional Understanding
Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu
Kaicheng Yang
Chao Guo
Haoran Xu
Ziyong Feng
Longji Xu
VLM
670
7
0
23 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjDVOS
588
96
0
17 Apr 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Weixiong Lin
Chen Ju
Haicheng Wang
Shengchao Hu
Shuai Xiao
...
Yuheng Jiao
Mingshuai Yao
Jinsong Lan
Qingwen Liu
Ying Chen
264
3
0
18 Mar 2025
Concept-as-Tree: A Controllable Synthetic Data Framework Makes Stronger Personalized VLMs
Concept-as-Tree: A Controllable Synthetic Data Framework Makes Stronger Personalized VLMs
Ruichuan An
Kai Zeng
Ming Lu
Sihan Yang
Renrui Zhang
Huitong Ji
Qizhe Zhang
Yihao Luo
421
4
0
17 Mar 2025
Dynamic Relation Inference via Verb Embeddings
Dynamic Relation Inference via Verb Embeddings
Omri Suissa
Muhiim Ali
Ariana Azarbal
Hui Shen
Shekhar Pradhan
339
0
0
17 Mar 2025
Enhanced Continual Learning of Vision-Language Models with Model Fusion
Enhanced Continual Learning of Vision-Language Models with Model Fusion
Haoyuan Gao
Zicong Zhang
Yuqi Wei
Linglan Zhao
Guilin Li
Rui Wang
Linghe Kong
Weiran Huang
CLLVLM
726
0
0
12 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level CaptionsComputer Vision and Pattern Recognition (CVPR), 2025
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
519
5
0
07 Mar 2025
FAA-CLIP: Federated Adversarial Adaptation of CLIPIEEE Internet of Things Journal (IEEE IoT J.), 2025
Yihang Wu
Ahmad Chaddad
Christian Desrosiers
Tareef Daqqaq
R. Kateb
VLM
281
4
0
26 Feb 2025
Contrastive Localized Language-Image Pre-Training
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
Hao Zhang
Xiang Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Yue Yang
Zhe Gan
CLIPVLM
287
22
0
20 Feb 2025
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Tiancheng Gu
Kaicheng Yang
Chaoyi Zhang
Yin Xie
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
CLIPVLM
451
5
0
18 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Tim Siebert
Ioannis Papoutsis
VLM
406
4
0
13 Feb 2025
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Bootstrapping
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Bootstrapping
Pu Yang
Yunzhen Feng
Ziyuan Chen
Yuhang Wu
Zhuoyuan Li
DiffM
338
1
0
31 Jan 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
S. Joshi
Besmira Nushi
Vidhisha Balachandran
Varun Chandrasekaran
Vibhav Vineet
Neel Joshi
Baharan Mirzasoleiman
MLLMVLM
366
1
0
07 Jan 2025
GFG -- Gender-Fair Generation: A CALAMITA Challenge
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
288
2
0
31 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-trainingComputer Vision and Pattern Recognition (CVPR), 2024
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
633
7
0
02 Dec 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
557
3
0
18 Nov 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging TaskProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
H. Haresamudram
Chi Ian Tang
Sungho Suh
P. Lukowicz
Thomas Ploetz
394
10
0
11 Nov 2024
TIPS: Text-Image Pretraining with Spatial awareness
TIPS: Text-Image Pretraining with Spatial awarenessInternational Conference on Learning Representations (ICLR), 2024
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
388
16
0
21 Oct 2024
LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts
LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic TextsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Anh-Quan Cao
M. Jaritz
Matthieu Guillaumin
Raoul de Charette
Loris Bazzani
VLMCLIP
295
4
0
10 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge AugmentationNeural Information Processing Systems (NeurIPS), 2024
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
355
23
0
30 Sep 2024
Finetuning CLIP to Reason about Pairwise Differences
Finetuning CLIP to Reason about Pairwise Differences
Dylan Sam
Devin Willmott
João Dias Semedo
J. Zico Kolter
VLM
317
8
0
15 Sep 2024
12
Next