v1v2v3 (latest)

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015

ArXiv (abs)PDF HTML Github (164★)

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 919 papers shown

Reasoning Segmentation for Images and Videos: A Survey

423

24 May 2025

InstructPart: Task-Oriented Part Segmentation with Instruction ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

188

23 May 2025

SynRES: Towards Referring Expression Segmentation in the Wild via Synthetic Data

Dong-Hee Kim

Hyunjee Song

Donghyun Kim

467

23 May 2025

Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation

...

605

23 May 2025

Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models

282

22 May 2025

Ground-V: Teaching VLMs to Ground Complex Instructions in PixelsComputer Vision and Pattern Recognition (CVPR), 2025

306

20 May 2025

Advancing Sequential Numerical Prediction in Autoregressive ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

555

19 May 2025

Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding

225

18 May 2025

Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models

Lucas Choi

Ross Greer

VLM

385

14 May 2025

Bias and Generalizability of Foundation Models across Datasets in Breast MammographyInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

321

14 May 2025

SITE: towards Spatial Intelligence Thorough Evaluation

290

08 May 2025

RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration

...

405

06 May 2025

SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

347

05 May 2025

RESAnything: Attribute Prompting for Arbitrary Referring Segmentation

Ruiqi Wang

Hao Zhang

VLM

276

03 May 2025

Multimodal Language Models See Better When They Look Shallower

355

30 Apr 2025

Progressive Language-guided Visual Learning for Multi-Task Visual Grounding

367

22 Apr 2025

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

494

22 Apr 2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

...

432

21 Apr 2025

LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image SegmentationPattern Recognition (Pattern Recogn.), 2025

473

20 Apr 2025

Visual Intention Grounding for Egocentric Assistants

279

18 Apr 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

...

577

790

14 Apr 2025

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

500

10 Apr 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

...

316

10 Apr 2025

Window Token Concatenation for Efficient Visual Large Language Models

269

05 Apr 2025

Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image SegmentationComputer Vision and Pattern Recognition (CVPR), 2025

Ting Liu

Siyuan Li

280

01 Apr 2025

InstructRestore: Region-Customized Image Restoration with Human Instructions

257

31 Mar 2025

ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025

303

30 Mar 2025

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-AnalysisComputer Vision and Pattern Recognition (CVPR), 2025

393

28 Mar 2025

Qwen2.5-Omni Technical Report

...

1.1K

337

26 Mar 2025

Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation

264

25 Mar 2025

Visual Position Prompt for MLLM based Visual Grounding

528

19 Mar 2025

MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning SegmentationInternational Conference on Learning Representations (ICLR), 2025

248

18 Mar 2025

Grounded Chain-of-Thought for Multimodal Large Language Models

455

17 Mar 2025

Federated Continual Instruction Tuning

519

17 Mar 2025

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

386

17 Mar 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

289

17 Mar 2025

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

337

17 Mar 2025

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

497

13 Mar 2025

Referring to Any Person

932

11 Mar 2025

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesComputer Vision and Pattern Recognition (CVPR), 2025

285

11 Mar 2025

Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning

297

10 Mar 2025

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

442

10 Mar 2025

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best PracticesComputer Vision and Pattern Recognition (CVPR), 2025

235

08 Mar 2025

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2025

300

08 Mar 2025

Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation

452

05 Mar 2025

Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs

349

04 Mar 2025

Teaching Metric Distance to Discrete Autoregressive Language Models

579

04 Mar 2025

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

453

03 Mar 2025

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word EmphasisAAAI Conference on Artificial Intelligence (AAAI), 2025

291

02 Mar 2025

New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

491

27 Feb 2025