Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1511.02283
Cited By
v1
v2
v3 (latest)
Generation and Comprehension of Unambiguous Object Descriptions
7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (164★)
Papers citing
"Generation and Comprehension of Unambiguous Object Descriptions"
50 / 917 papers shown
Title
TFANet: Three-Stage Image-Text Feature Alignment Network for Robust Referring Image Segmentation
Qianqi Lu
Yuxiang Xie
Jing Zhang
Shiwei Zou
Yan Chen
Xidao Luan
122
0
0
16 Sep 2025
PATIMT-Bench: A Multi-Scenario Benchmark for Position-Aware Text Image Machine Translation in Large Vision-Language Models
Wanru Zhuang
Wenbo Li
Zhibin Lan
Xu Han
P. Li
Jinsong Su
VLM
104
0
0
14 Sep 2025
Zero-Shot Referring Expression Comprehension via Vison-Language True/False Verification
Jeffrey Liu
Rongbin Hu
ObjD
158
0
0
12 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
272
3
0
12 Sep 2025
Text4Seg++: Advancing Image Segmentation via Generative Language Modeling
Mengcheng Lan
Chaofeng Chen
Jiaxing Xu
Zongrui Li
Yiping Ke
Xudong Jiang
Yingchen Yu
Yunqing Zhao
S. Bai
VLM
148
1
0
08 Sep 2025
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Jiangnan Xie
Xiaolong Zheng
Liang Zheng
ObjD
153
0
0
08 Sep 2025
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai
Wenxuan Cheng
Jiedong Zhuang
Jiang-Jiang Liu
Hongshen Zhao
Zhenhua Feng
Wankou Yang
ObjD
189
3
0
05 Sep 2025
Guideline-Consistent Segmentation via Multi-Agent Refinement
Vanshika Vats
Ashwani Rathee
James Davis
VLM
184
0
0
04 Sep 2025
VoCap: Video Object Captioning and Segmentation from Any Prompt
J. Uijlings
Xingyi Zhou
Xiuye Gu
Arsha Nagrani
Anurag Arnab
Alireza Fathi
David A. Ross
Cordelia Schmid
VOS
VLM
224
1
0
29 Aug 2025
GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
Kei Katsumata
Yui Iioka
Naoki Hosomi
Teruhisa Misu
Kentaro Yamada
K. Sugiura
83
0
0
28 Aug 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
...
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLM
LRM
270
221
0
25 Aug 2025
RynnEC: Bringing MLLMs into Embodied World
Ronghao Dang
Yuqian Yuan
Yunxuan Mao
Kehan Li
Jiangpin Liu
Zhikai Wang
Xin Li
F. Wang
Deli Zhao
VGen
LM&Ro
201
4
0
19 Aug 2025
ViDA-UGC: Detailed Image Quality Analysis via Visual Distortion Assessment for UGC Images
Wenjie Liao
Jieyu Yuan
Yifang Xu
Chunle Guo
Zilong Zhang
...
Jiachen Fu
Haotian Fan
Tao Li
Junhui Cui
Chongyi Li
93
0
0
18 Aug 2025
Ovis2.5 Technical Report
Shiyin Lu
Yan Zhao
Yu Xia
Yuwei Hu
Shanshan Zhao
...
Yuhui Chen
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
VLM
LRM
111
29
0
15 Aug 2025
KnowDR-REC: A Benchmark for Referring Expression Comprehension with Real-World Knowledge
Guanghao Jin
Jingpei Wu
Tianpei Guo
Yiyi Niu
Weidong Zhou
Guoyang Liu
105
1
0
12 Aug 2025
SHREC 2025: Retrieval of Optimal Objects for Multi-modal Enhanced Language and Spatial Assistance (ROOMELSA)
Computers & graphics (Comput. Graph.), 2025
T. Nguyen
Viet-Tham Huynh
Quang-Thuc Nguyen
H. Nguyen
Long Le Bao
...
Dinh-Khoi Vo
Van-Loc Nguyen
Trung-Truc Huynh-Le
Tam V. Nguyen
Minh-Triet Tran
3DV
96
1
0
12 Aug 2025
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Weitai Kang
Weiming Zhuang
Zhizhong Li
Yan Yan
Lingjuan Lyu
98
1
0
11 Aug 2025
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
Jianting Tang
Yubo Wang
Haoyu Cao
Linli Xu
64
0
0
09 Aug 2025
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen
Ruihui Zhao
Chuwei Luo
Mingze Sun
Xinlei Yu
Yangyang Kang
Ruqi Huang
LRM
197
4
0
08 Aug 2025
Latent Expression Generation for Referring Image Segmentation and Grounding
S. Yu
Joonbeom Hong
Joonseok Lee
Jeany Son
ObjD
189
1
0
07 Aug 2025
Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder
Jingchao Wang
Zhijian Wu
Dingjiang Huang
Yefeng Zheng
Hong Wang
108
0
0
06 Aug 2025
AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding
Yidan Wang
Chenyi Zhuang
Wutao Liu
Pan Gao
Nicu Sebe
ObjD
194
0
0
05 Aug 2025
Multimodal Referring Segmentation: A Survey
Henghui Ding
Song Tang
Shuting He
Chang-rui Liu
Zuxuan Wu
Yu-Gang Jiang
330
10
0
01 Aug 2025
Fine-grained Spatiotemporal Grounding on Egocentric Videos
Shuo Liang
Yiwu Zhong
Zi-Yuan Hu
Yeyao Tao
Liwei Wang
EgoV
214
4
0
01 Aug 2025
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Weide Liu
Wei Zhou
Jun Liu
Ping Hu
Jun Cheng
Jungong Han
Weisi Lin
3DV
179
3
0
30 Jul 2025
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying
Henghui Ding
Guangquan Jie
Yu Jiang
VOS
269
5
0
30 Jul 2025
ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking
X. Feng
Shuyan Hu
X. Li
D. Zhang
M. Wu
Jie Zhang
Xiaosha Chen
K. Huang
142
3
0
26 Jul 2025
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang
Qirui Chen
Cilin Yan
Jiayin Cai
Xiaolong Jiang
Yao Hu
Weidi Xie
Stratis Gavves
MLLM
VOS
208
4
0
25 Jul 2025
ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Yizhi Hu
Zezhao Tian
Xingqun Qi
Chen Su
Bingkun Yang
Junhui Yin
Muyi Sun
Man Zhang
Zhenan Sun
ObjD
126
0
0
22 Jul 2025
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation
Weihuang Lin
Yiwei Ma
Xiaoshuai Sun
Shuting He
Jiayi Ji
Liujuan Cao
Rongrong Ji
114
1
0
17 Jul 2025
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Ming Dai
Wenxuan Cheng
Jiang-Jiang Liu
Sen Yang
Wenxiao Cai
Yanpeng Sun
Wankou Yang
164
6
0
02 Jul 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Duc Cao-Dinh
Khai Le-Duc
Anh Dao
Bach Phan Tat
Chris Ngo
Duy M. H. Nguyen
Nguyen X. Khanh
Thanh Nguyen-Tang
177
0
0
01 Jul 2025
Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
Bob Zhang
Haoran Li
Tao Zhang
Cilin Yan
Jiayin Cai
Yanbin Hao
OffRL
LRM
188
4
0
01 Jul 2025
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
MLLM
LRM
533
2
0
01 Jul 2025
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Zhihao Zhang
Qiaole Dong
Qi Zhang
Jun Zhao
Enyu Zhou
...
Yanwei Fu
Changzhi Sun
Tao Gui
Xuanjing Huang
Kai Chen
CLL
172
0
0
30 Jun 2025
MDC-R: The Minecraft Dialogue Corpus with Reference
Chris Madge
Maris Camilleri
Paloma Carretero García
Vanja Karan
Juexi Shao
Prashant Jayannavar
Julian Hough
Benjamin Roth
Massimo Poesio
95
2
0
27 Jun 2025
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Yeongtak Oh
J. Mok
Juhyeon Shin
Juhyeon Shin
Sangha Park
J. Mok
Sungroh Yoon
VLM
318
1
0
23 Jun 2025
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
Fan Yang
Yousong Zhu
Xin Li
Yufei Zhan
Hongyin Zhao
Shurong Zheng
Yaowei Wang
Ming Tang
Jinqiao Wang
MLLM
VLM
210
0
0
20 Jun 2025
ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections
Ziling Huang
Yidan Zhang
Shiníchi Satoh
ObjD
159
1
0
18 Jun 2025
Synthetic Visual Genome
Computer Vision and Pattern Recognition (CVPR), 2025
J. S. Park
Zixian Ma
Linjie Li
Chenhao Zheng
Cheng-Yu Hsieh
...
Quan Kong
Norimasa Kobori
Ali Farhadi
Yejin Choi
Ranjay Krishna
180
0
0
09 Jun 2025
Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations
Yizhen Li
Dell Zhang
Xuelong Li
Yiqing Shen
VLM
152
0
0
09 Jun 2025
Refer to Any Segmentation Mask Group With Vision-Language Prompts
Shengcao Cao
Zijun Wei
Jason Kuen
Kangning Liu
Lingzhi Zhang
Jiuxiang Gu
HyunJoon Jung
Liang-Yan Gui
Yu Wang
VLM
310
2
0
05 Jun 2025
R2SM: Referring and Reasoning for Selective Masks
Yu-Lin Shih
Wei-En Tai
Cheng Sun
Y. Wang
Hwann-Tzong Chen
ISeg
283
0
0
02 Jun 2025
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Duzhen Zhang
Yong Ren
Zhong-Zhi Li
Yahan Yu
Jiahua Dong
Chenxing Li
Zhilong Ji
Jinfeng Bai
CLL
207
4
0
31 May 2025
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
Ce Zhang
Kaixin Ma
Tianqing Fang
Wenhao Yu
Hongming Zhang
Zhisong Zhang
Yaqi Xie
Katia Sycara
Haitao Mi
Dong Yu
VLM
254
5
0
28 May 2025
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs
Zhehan Kan
Y. Liu
Kun Yin
Xinghua Jiang
Xin Li
...
Yinsong Liu
Shihong Deng
Xing Sun
Qingmin Liao
Wenming Yang
LRM
243
1
0
27 May 2025
RefAV: Towards Planning-Centric Scenario Mining
Cainan Davidson
Deva Ramanan
Neehar Peri
351
6
0
27 May 2025
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
Alaa Dalaq
Muzammil Behzad
VLM
381
0
0
25 May 2025
Reasoning Segmentation for Images and Videos: A Survey
Yiqing Shen
Chenjia Li
Fei Xiong
Jeong-O Jeong
Tianpeng Wang
Michael Latman
Mathias Unberath
VOS
396
8
0
24 May 2025
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
Zhihua Liu
Amrutha Saseendran
Lei Tong
Xilin He
Fariba Yousefi
...
Dino Oglic
Tom Diethe
Philip Teare
Huiyu Zhou
Chen Jin
VLM
587
3
0
23 May 2025
Previous
1
2
3
4
5
...
17
18
19
Next