Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.13273
Cited By
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
26 April 2023
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping"
5 / 5 papers shown
Title
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
35
0
0
10 Jul 2023
Text-Only Training for Image Captioning using Noise-Injected CLIP
David Nukrai
Ron Mokady
Amir Globerson
VLM
CLIP
41
69
0
01 Nov 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
182
403
0
13 Jul 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
1