Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.03195
Cited By
v1
v2 (latest)
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Computer Vision and Pattern Recognition (CVPR), 2019
8 August 2019
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISeg
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LVIS: A Dataset for Large Vocabulary Instance Segmentation"
50 / 1,056 papers shown
Title
Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
Joan Nwatu
Longju Bai
Oana Ignat
Rada Mihalcea
20
0
0
02 Dec 2025
FOM-Nav: Frontier-Object Maps for Object Goal Navigation
Thomas Chabal
Shizhe Chen
Jean Ponce
Cordelia Schmid
56
0
0
30 Nov 2025
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Jiazhen Liu
Mingkuan Feng
Long Chen
68
0
0
29 Nov 2025
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Apratim Bhattacharyya
Bicheng Xu
Sanjay Haresh
Reza Pourreza
Litian Liu
Sunny Panchal
Pulkit Madan
Leonid Sigal
Roland Memisevic
104
0
0
27 Nov 2025
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
Chujie Wang
Jianyu Lu
Zhiyuan Luo
Xi Chen
Chu He
LM&Ro
242
0
0
26 Nov 2025
NNGPT: Rethinking AutoML with Large Language Models
Roman Kochnev
Waleed Khalid
Tolgay Atinc Uzun
X. Zhang
Yashkumar Sanjaybhai Dhameliya
...
Chandini Vysyaraju
Raghuvir Duvvuri
Avi Goyal
D. Ignatov
Radu Timofte
LM&MA
LRM
195
5
0
25 Nov 2025
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
121
0
0
25 Nov 2025
State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection
Jiaying Zhou
Qingchao Chen
104
0
0
22 Nov 2025
SAM 3D: 3Dfy Anything in Images
SAM 3D Team
Xingyu Chen
Fu-Jen Chu
Pierre Gleize
Kevin J. Liang
...
Bowen Zhang
Piotr Dollár
Georgia Gkioxari
Matt Feiszli
Jitendra Malik
319
4
0
20 Nov 2025
RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation
Xiaoshuai Hao
Yingbo Tang
Lingfeng Zhang
Yanbiao Ma
Yunfeng Diao
Ziyu Jia
Wenbo Ding
Hangjun Ye
L. Chen
LM&Ro
213
0
0
16 Nov 2025
GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding
Athul M. Mathew
Haithem Hermassi
Thariq Khalid
Arshad Ali Khan
R. Souissi
96
0
0
09 Nov 2025
iFlyBot-VLM Technical Report
Xin Nie
Zhiyuan Cheng
Yuan Zhang
Chao Ji
Jiajia wu
Yuhan Zhang
Jia Pan
LM&Ro
318
0
0
07 Nov 2025
In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy
Shreyan Ganguly
Angona Biswas
Jaydeep Rade
Md Hasibul Hasan Hasib
Nabila Masud
...
Ushashi Bhattacharjee
Aditya Balu
A. Sarkar
A. Krishnamurthy
Soumik Sarkar
ObjD
VLM
224
0
0
04 Nov 2025
OLATverse: A Large-scale Real-world Object Dataset with Precise Lighting Control
Xilong Zhou
Jianchun Chen
Pramod Rao
Timo Teufel
Linjie Lyu
Tigran Minasian
Oleksandr Sotnychenko
Xiao-Xiao Long
Marc Habermann
Christian Theobalt
206
1
0
04 Nov 2025
TRACE: Textual Reasoning for Affordance Coordinate Extraction
S. Park
Jin Kim
Yuchen Cui
Matthew S. Brown
LM&Ro
LRM
299
0
0
03 Nov 2025
LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
Yang Miao
Jan-Nico Zaech
Xi Wang
Fabien Despinoy
Danda Pani Paudel
Luc Van Gool
VLM
314
0
0
29 Oct 2025
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan
W. Zhang
Xin Li
Shihao Wang
Kehan Li
Wentong Li
Jun Xiao
Lei Zhang
Beng Chin Ooi
ObjD
350
0
0
27 Oct 2025
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Haochen Wang
Yuhao Wang
Tao Zhang
Yikang Zhou
Yanwei Li
...
Anran Wang
Yunhai Tong
Z. Wang
X. Li
Zhaoxiang Zhang
VLM
197
0
0
21 Oct 2025
BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining
Ajinkya Khoche
Gergő László Nagy
Maciej K. Wozniak
Thomas Gustafsson
Patric Jensfelt
126
0
0
21 Oct 2025
Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis
Xinhao Cai
Liulei Li
Gensheng Pei
Tao Chen
Jinshan Pan
Yazhou Yao
Wenguan Wang
160
0
0
21 Oct 2025
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Mingxuan Liu
Honglin He
Elisa Ricci
Wayne Wu
Bolei Zhou
VGen
128
0
0
16 Oct 2025
MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning
Mattia Segu
Marta Tintore Gazulla
Yongqin Xian
Luc Van Gool
Federico Tombari
82
0
0
16 Oct 2025
MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos
Gabriel Fiastre
Antoine Yang
Cordelia Schmid
VOS
410
0
0
16 Oct 2025
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjD
LRM
VLM
321
1
0
16 Oct 2025
Generative Universal Verifier as Multimodal Meta-Reasoner
Xinchen Zhang
X. Zhang
Youbin Wu
Yanbin Cao
Renrui Zhang
Ruihang Chu
Ling Yang
Yujiu Yang
LRM
164
3
0
15 Oct 2025
Detect Anything via Next Point Prediction
Qing Jiang
Junan Huo
Xingyu Chen
Yuda Xiong
Zhaoyang Zeng
Yihao Chen
Tianhe Ren
Junzhi Yu
Lei Zhang
ObjD
207
11
0
14 Oct 2025
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Ji Ao
Dawei Leng
Yuhui Yin
VLM
233
2
0
13 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
141
1
0
12 Oct 2025
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu
Yufei Yin
Chenchen Jing
M. Zhu
Hao Chen
Yuling Xi
Bo Feng
Hao Wang
Shiyu Li
Chunhua Shen
VLM
106
0
0
12 Oct 2025
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Weikai Huang
Jieyu Zhang
Taoyang Jia
Chenhao Zheng
Ziqi Gao
J. S. Park
Winson Han
Ranjay Krishna
217
0
0
10 Oct 2025
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
Jyoti Kini
Rohit Gupta
Mubarak Shah
ObjD
VLM
181
0
0
04 Oct 2025
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
Peng Liu
H. Shen
Chunxin Fang
Zhicheng Sun
Jiajia Liao
T. Zhao
MLLM
ObjD
VLM
LRM
205
2
0
30 Sep 2025
C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection
Siheng Wang
Zhengdao Li
Yanshu Li
Canran Xiao
Haibo Zhan
...
Zhikang Dong
Jifeng Shen
Junhao Dong
Qiang Sun
Piotr Koniusz
ObjD
VLM
220
6
0
27 Sep 2025
Video models are zero-shot learners and reasoners
Thaddäus Wiedemer
Yuxuan Li
Paul Vicol
Shixiang Shane Gu
Nick Matarese
Kevin Swersky
Been Kim
P. Jaini
Robert Geirhos
VLM
LRM
244
50
0
24 Sep 2025
Lattice Boltzmann Model for Learning Real-World Pixel Dynamicity
Guangze Zheng
Shijie Lin
Haobo Zuo
Si Si
Ming-Shan Wang
Changhong Fu
Jia Pan
174
0
0
20 Sep 2025
MMMS: Multi-Modal Multi-Surface Interactive Segmentation
Robin Schon
Julian Lorenz
K. Ludwig
Daniel Kienzle
Rainer Lienhart
120
0
0
16 Sep 2025
Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations
Y. Lu
Ziqi Zhang
Chunfeng Yuan
Jun Gao
Congxuan Zhang
Xiaojuan Qi
Bing Li
Weiming Hu
MLLM
VLM
93
0
0
14 Sep 2025
Augment to Segment: Tackling Pixel-Level Imbalance in Wheat Disease and Pest Segmentation
Tianqi Wei
Xin Yu
Zhi Chen
Scott Chapman
Zi-Rui Huang
118
0
0
12 Sep 2025
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
Yuecheng Liu
Dafeng Chi
Shiguang Wu
Zhanguang Zhang
Yuzheng Zhuang
...
Pengwei Xie
David Gamaliel Arcos Bravo
Yingxue Zhang
Jianye Hao
Xingyue Quan
LM&Ro
LRM
162
2
0
11 Sep 2025
When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection
Rabin Dulal
Lihong Zheng
M. A. Kabir
92
0
0
08 Sep 2025
Harnessing Object Grounding for Time-Sensitive Video Understanding
Tz-Ying Wu
S. N. Sridhar
Subarna Tripathi
137
0
0
08 Sep 2025
Light-Weight Cross-Modal Enhancement Method with Benchmark Construction for UAV-based Open-Vocabulary Object Detection
Zhenhai Weng
Xinjie Li
Can Wu
Weijie He
Jianfeng Lv
Dong Zhou
Zhongliang Yu
ObjD
VLM
227
0
0
07 Sep 2025
UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features
Haowang Cui
Rui Chen
Tao Luo
Rui Li
Jiaze Wang
129
0
0
05 Sep 2025
InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System
Xianbao Hou
Yonghao He
Zeyd Boukhers
John See
Hu Su
Wei Sui
Cong Yang
DiffM
VLM
117
0
0
03 Sep 2025
Improving Long-Tailed Object Detection with Balanced Group Softmax and Metric Learning
International Computer Science Conference (ICSC), 2025
Satyam Gaba
76
0
0
02 Sep 2025
Robix: A Unified Model for Robot Interaction, Reasoning and Planning
Huang Fang
Mengxi Zhang
Heng Dong
Wei Li
Z. Wang
Qifeng Zhang
Xueyun Tian
Yucheng Hu
Hang Li
LM&Ro
LRM
156
7
0
01 Sep 2025
Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation
Maelic Neau
Zoe Falomir
Cédric Buche
Akihiro Sugimoto
109
0
0
01 Sep 2025
Rethinking Human-Object Interaction Evaluation for both Vision-Language Models and HOI-Specific Methods
Qinqian Lei
Bo Wang
R. Tan
VLM
107
0
0
26 Aug 2025
Robust and Label-Efficient Deep Waste Detection
Hassan Abid
Khan Muhammad
M. H. Khan
HAI
VLM
120
0
0
26 Aug 2025
Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo
Dahyun Kang
Sanghyun Kim
Yunseon Choi
Minsu Cho
114
0
0
25 Aug 2025
1
2
3
4
...
20
21
22
Next