Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

26 April 2024

Papers citing "Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models"

4 / 4 papers shown

Title
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images Zalan Fabian Zhongqi Miao Chunyuan Li Yuanhan Zhang Ziwei Liu ... Laura Siabatto Andrés Link Pablo Arbelaez Rahul Dodhia J. L. Ferres 38 10 0 02 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface Chen Wei Chenxi Liu Siyuan Qiao Zhishuai Zhang Alan Yuille Jiahui Yu VLM DiffM 29 10 0 01 Nov 2023
Fine-grained Image Captioning with CLIP Reward Jaemin Cho Seunghyun Yoon Ajinkya Kale Franck Dernoncourt Trung Bui Mohit Bansal CLIP 123 76 0 26 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 390 4,125 0 28 Jan 2022