Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.17007
Cited By
DreamLIP: Language-Image Pre-training with Long Captions
25 March 2024
Kecheng Zheng
Yifei Zhang
Wei Wu
Fan Lu
Shuailei Ma
Xin Jin
Wei Chen
Yujun Shen
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DreamLIP: Language-Image Pre-training with Long Captions"
15 / 15 papers shown
Title
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
64
10
0
28 Jan 2025
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
70
0
0
02 Dec 2024
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
N. V. Noord
Marcel Worring
Cees G. M. Snoek
VLM
33
3
0
13 Oct 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
54
4
0
09 Jul 2024
MATE: Meet At The Embedding -- Connecting Images with Long Texts
Young Kyun Jang
Junmo Kang
Yong Jae Lee
Donghyun Kim
VLM
16
5
0
26 Jun 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
36
3
0
24 May 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
125
85
0
07 Mar 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Hasan Hammoud
Hani Itani
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
CLIP
VLM
107
34
0
02 Feb 2024
Adversarial Diffusion Distillation
Axel Sauer
Dominik Lorenz
A. Blattmann
Robin Rombach
132
326
0
28 Nov 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
41
44
0
06 Mar 2023
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee
Jongsuk Kim
Hyounguk Shon
Bumsoo Kim
Seung Wook Kim
Honglak Lee
Junmo Kim
CLIP
VLM
44
51
0
27 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
Andreas Fürst
Elisabeth Rumetshofer
Johannes Lehner
Viet-Hung Tran
Fei Tang
...
David P. Kreil
Michael K Kopp
G. Klambauer
Angela Bitto-Nemling
Sepp Hochreiter
VLM
CLIP
185
101
0
21 Oct 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
1