Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.20088
Cited By
v1
v2 (latest)
Improving CLIP Training with Language Rewrites
Neural Information Processing Systems (NeurIPS), 2023
31 May 2023
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDL
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Github (280★)
Papers citing
"Improving CLIP Training with Language Rewrites"
29 / 79 papers shown
Finetuning CLIP to Reason about Pairwise Differences
Dylan Sam
Devin Willmott
João Dias Semedo
J. Zico Kolter
VLM
362
8
0
15 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
International Conference on Learning Representations (ICLR), 2024
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
451
32
0
02 Sep 2024
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Isprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2024
Junyao Ge
Xu Zhang
Yang Zheng
Kaitai Guo
Jimin Liang
618
6
0
27 Aug 2024
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
Juhwan Choi
Junehyoung Kwon
Jungmin Yun
Seunguk Yu
Youngbin Kim
309
3
0
29 Jul 2024
LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models
Achintya Gopal
Nicholas Wai Long Lau
Eva Adelina Susanto
Chi Lok Yu
Aditya Paul
ELM
257
11
0
27 Jul 2024
AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools
Aditya Paul
Chi Lok Yu
Eva Adelina Susanto
Nicholas Wai Long Lau
Gwenyth Isobel Meadows
LLMAG
266
7
0
27 Jul 2024
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
S. D. Tran
Mubarak Shah
Benjamin Z. Yao
Trishul Chilimbi
VLM
241
6
0
12 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
382
7
0
09 Jul 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
383
64
0
26 May 2024
Data Augmentation for Text-based Person Retrieval Using Large Language Models
Zheng Li
Lijia Si
Caili Guo
Yang Yang
Qiushi Cao
200
10
0
20 May 2024
The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Yaohui Li
Qifeng Zhou
Haoxing Chen
Jianbing Zhang
Xinyu Dai
Hao Zhou
VLM
275
1
0
15 Apr 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
237
36
0
30 Mar 2024
Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
Minhyuk Seo
Diganta Misra
Seongwon Cho
Minjae Lee
Jonghyun Choi
CLL
350
12
0
16 Mar 2024
A Deep Learning Method for Classification of Biophilic Artworks
Purna Kar
Jordan J. Bird
Yangang Xing
Alexander Sumich
Andrew Knight
Ahmad Lotfi
Benedict Carpenter van Barthold
178
0
0
08 Mar 2024
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
522
4
0
01 Mar 2024
Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic Supervision
Fan Jiang
Tom Drummond
Trevor Cohn
CLIP
ELM
LRM
300
5
0
26 Feb 2024
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
Hyunjae Kim
Seunghyun Yoon
Trung Bui
Handong Zhao
Quan Tran
Franck Dernoncourt
Jaewoo Kang
CLIP
238
4
0
23 Feb 2024
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
309
1
0
10 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
395
14
0
06 Dec 2023
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Computer Vision and Pattern Recognition (CVPR), 2023
Phillip Howard
Avinash Madasu
Tiep Le
Gustavo Lujan Moreno
Anahita Bhiwandiwalla
Vasudev Lal
328
48
0
30 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
European Conference on Computer Vision (ECCV), 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Yuan Liu
Feng Zhao
Dahua Lin
MLLM
VLM
380
934
0
21 Nov 2023
Deep Tensor Network
Yifan Zhang
370
0
0
18 Nov 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
European Conference on Computer Vision (ECCV), 2023
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
361
58
0
11 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
288
6
0
02 Oct 2023
Improving Multimodal Datasets with Image Captioning
Neural Information Processing Systems (NeurIPS), 2023
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
260
125
0
19 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Neural Information Processing Systems (NeurIPS), 2023
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Jiaming Song
340
166
0
03 Jul 2023
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
Luigi Celona
Simone Bianco
Marco Donzella
Paolo Napoletano
252
26
0
20 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
International Conference on Learning Representations (ICLR), 2023
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
287
39
0
12 Jun 2023
Vision-Language Models for Vision Tasks: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
497
1,014
0
03 Apr 2023
Previous
1
2