MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained
  Vision-Language Understanding

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

Papers citing "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding"