Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.04594
Cited By
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
8 August 2024
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models"
3 / 3 papers shown
Title
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
95
1
0
03 Dec 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
47
1
0
04 Sep 2024
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
1