v1v2 (latest)

LVIS: A Dataset for Large Vocabulary Instance Segmentation

Computer Vision and Pattern Recognition (CVPR), 2019

8 August 2019

Piotr Dollár

Papers citing "LVIS: A Dataset for Large Vocabulary Instance Segmentation"

50 / 1,059 papers shown

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

02 Dec 2025

FOM-Nav: Frontier-Object Maps for Object Goal Navigation

30 Nov 2025

Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction

Jiazhen Liu

Mingkuan Feng

Long Chen

29 Nov 2025

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?

Apratim Bhattacharyya

112

27 Nov 2025

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection

257

26 Nov 2025

NNGPT: Rethinking AutoML with Large Language Models

Yashkumar Sanjaybhai Dhameliya

...

219

25 Nov 2025

LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight

121

25 Nov 2025

State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection

Jiaying Zhou

Qingchao Chen

113

22 Nov 2025

SAM 3D: 3Dfy Anything in Images

...

350

20 Nov 2025

RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation

230

16 Nov 2025

GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding

111

09 Nov 2025

iFlyBot-VLM Technical Report

331

07 Nov 2025

In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy

Shreyan Ganguly

Angona Biswas

Jaydeep Rade

Md Hasibul Hasan Hasib

Nabila Masud

...

Ushashi Bhattacharjee

241

04 Nov 2025

OLATverse: A Large-scale Real-world Object Dataset with Precise Lighting Control

Oleksandr Sotnychenko

Xiao-Xiao Long

Marc Habermann

Christian Theobalt

222

04 Nov 2025

TRACE: Textual Reasoning for Affordance Coordinate Extraction

312

03 Nov 2025

LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation

330

29 Oct 2025

PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity

370

27 Oct 2025

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

...

226

21 Oct 2025

BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining

152

21 Oct 2025

Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis

170

21 Oct 2025

CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection

376

16 Oct 2025

MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos

448

16 Oct 2025

UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos

136

16 Oct 2025

MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning

Mattia Segu

Marta Tintore Gazulla

Yongqin Xian

Luc Van Gool

Federico Tombari

16 Oct 2025

Generative Universal Verifier as Multimodal Meta-Reasoner

181

15 Oct 2025

Detect Anything via Next Point Prediction

211

14 Oct 2025

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

250

13 Oct 2025

Unified Open-World Segmentation with Multi-Modal Prompts

107

12 Oct 2025

Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey

151

12 Oct 2025

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding

226

10 Oct 2025

Cross-View Open-Vocabulary Object Detection in Aerial Imagery

199

04 Oct 2025

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

222

30 Sep 2025

C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection

...

260

27 Sep 2025

Video models are zero-shot learners and reasoners

248

24 Sep 2025

Lattice Boltzmann Model for Learning Real-World Pixel Dynamicity

193

20 Sep 2025

MMMS: Multi-Modal Multi-Surface Interactive Segmentation

138

16 Sep 2025

Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations

14 Sep 2025

Augment to Segment: Tackling Pixel-Level Imbalance in Wheat Disease and Pest Segmentation

129

12 Sep 2025

OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning

...

David Gamaliel Arcos Bravo

193

11 Sep 2025

When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection

Rabin Dulal

Lihong Zheng

M. A. Kabir

113

08 Sep 2025

Harnessing Object Grounding for Time-Sensitive Video Understanding

Tz-Ying Wu

S. N. Sridhar

Subarna Tripathi

162

08 Sep 2025

Light-Weight Cross-Modal Enhancement Method with Benchmark Construction for UAV-based Open-Vocabulary Object Detection

248

07 Sep 2025

UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features

148

05 Sep 2025

InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System

147

03 Sep 2025

Improving Long-Tailed Object Detection with Balanced Group Softmax and Metric LearningInternational Computer Science Conference (ICSC), 2025

Satyam Gaba

02 Sep 2025

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

169

01 Sep 2025

Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation

124

01 Sep 2025

Robust and Label-Efficient Deep Waste Detection

141

26 Aug 2025

Rethinking Human-Object Interaction Evaluation for both Vision-Language Models and HOI-Specific Methods

132

26 Aug 2025

Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025

Ranjan Sapkota

Manoj Karkee

ObjD VLM

295

25 Aug 2025