Enhancing Multimodal Large Language Models with Multi-instance Visual
Prompt Generator for Visual Representation Enrichment

Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

5 June 2024

Wenliang Zhong

Rob Barton

Karim Bouyarmane

Ismail B. Tutar

Papers citing "Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment"

6 / 6 papers shown

Title
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks Mohammad Saleha Azadeh Tabatabaeib 52 0 0 14 Apr 2025
Addressing Hallucinations in Language Models with Knowledge Graph Embeddings as an Additional Modality Viktoriia Chekalina Anton Razzigaev Elizaveta Goncharova Andrey Kuznetsov KELM 71 0 0 18 Nov 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 270 4,244 0 30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 392 4,137 0 28 Jan 2022
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding Jasmine Collins Shubham Goel Kenan Deng Achleshwar Luthra Leon L. Xu ... T. F. Y. Vicente T. Dideriksen H. Arora M. Guillaumin Jitendra Malik 154 217 0 12 Oct 2021
Densely Connected Convolutional Networks Gao Huang Zhuang Liu L. V. D. van der Maaten Kilian Q. Weinberger PINN 3DV 261 36,371 0 25 Aug 2016