v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015

Bryan A. Plummer

Liwei Wang

Christopher M. Cervantes

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown

Trade-offs in Image Generation: How Do Different Dimensions Interact?

163

29 Jul 2025

MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning

275

29 Jul 2025

ZSE-Cap: A Zero-Shot Ensemble for Image Retrieval and Prompt-Guided Captioning

Duc-Tai Dinh

Duc Anh Khoa Dinh

VLM

28 Jul 2025

On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey

267

28 Jul 2025

Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation

189

26 Jul 2025

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection

167

23 Jul 2025

ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension

149

22 Jul 2025

U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs

251

20 Jul 2025

FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text

234

14 Jul 2025

PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning

236

10 Jul 2025

With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You

262

20 Jun 2025

Control and Realism: Best of Both Worlds in Layout-to-Image without Training

240

18 Jun 2025

GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models

112

16 Jun 2025

CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical OperationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

230

15 Jun 2025

On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval

195

13 Jun 2025

Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise

172

12 Jun 2025

An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models

243

10 Jun 2025

Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing

Christos Margadji

Sebastian W. Pattinson

AI4CE

115

10 Jun 2025

Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models

285

09 Jun 2025

Synthetic Visual GenomeComputer Vision and Pattern Recognition (CVPR), 2025

...

224

09 Jun 2025

CoMemo: LVLMs Need Image Context with Image Memory

219

06 Jun 2025

DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models

...

253

03 Jun 2025

R2SM: Referring and Reasoning for Selective Masks

352

02 Jun 2025

Data Pruning by Information MaximizationInternational Conference on Learning Representations (ICLR), 2025

331

02 Jun 2025

Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models

167

30 May 2025

Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning

367

30 May 2025

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

283

30 May 2025

Benchmarking Foundation Models for Zero-Shot Biometric Tasks

267

30 May 2025

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

199

30 May 2025

Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtComputer Vision and Pattern Recognition (CVPR), 2025

335

29 May 2025

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

312

29 May 2025

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models

279

27 May 2025

Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms

211

27 May 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

...

473

26 May 2025

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

281

25 May 2025

TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP

239

24 May 2025

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

...

711

24 May 2025

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2025

471

24 May 2025

Reasoning Segmentation for Images and Videos: A Survey

430

24 May 2025

Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation

...

608

23 May 2025

Learning Shared Representations from Unpaired Data

295

23 May 2025

Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion

447

23 May 2025

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

531

18 May 2025

UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings

363

17 May 2025

GeoMM: On Geodesic Perspective for Multi-modal LearningComputer Vision and Pattern Recognition (CVPR), 2025

Shibin Mei

Hang Wang

Bingbing Ni

317

16 May 2025

Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic StructuresAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

356

16 May 2025

Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining

Raghuveer Thirukovalluru

356

16 May 2025

Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights

539

15 May 2025

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

329

11 May 2025

TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for DocumentariesInternational Conference on Applications of Natural Language to Data Bases (NLDB), 2025

328

09 May 2025