ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02283
  4. Cited By
Generation and Comprehension of Unambiguous Object Descriptions
v1v2v3 (latest)

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
    ObjD
ArXiv (abs)PDFHTMLGithub (164★)

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 917 papers shown
Title
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Jiazhen Liu
Mingkuan Feng
Long Chen
32
0
0
29 Nov 2025
Qwen3-VL Technical Report
Qwen3-VL Technical Report
Shuai Bai
Yuxuan Cai
Ruizhe Chen
Keqin Chen
Xionghui Chen
...
Jingren Zhou
F. I. S. Kevin Zhou
J. Zhou
Yuanzhi Zhu
Ke Zhu
VLM
871
32
0
26 Nov 2025
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
109
0
0
25 Nov 2025
Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning
Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning
Qihan Huang
H. Zhang
Rong Wei
Yi Wang
Rui Tang
Mingli Song
Jie Song
88
0
0
24 Nov 2025
Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving
Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving
J. N. Han
Meng Tian
Jiangtong Zhu
Fan He
Huixin Zhang
...
Siyuan Dong
Lu Hou
Qingqiu Huang
Xiaosong Jia
H. Xu
VLM
91
1
0
24 Nov 2025
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
Mark Endo
Serena Yeung-Levy
LRM
201
0
0
21 Nov 2025
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
Zishan Xu
Yifu Guo
Y. Lu
Fengyu Yang
J. Li
VOS
184
0
0
20 Nov 2025
LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
X. Shi
Silin Cheng
Sirui Zhao
Yunhan Jiang
Enhong Chen
Yang Liu
Sebastien Ourselin
140
0
0
15 Nov 2025
Fast Reasoning Segmentation for Images and Videos
Fast Reasoning Segmentation for Images and Videos
Yiqing Shen
Mathias Unberath
VLMLRM
139
0
0
15 Nov 2025
NOVO: Bridging LLaVA and SAM with Visual-only Prompts for Reasoning Segmentation
NOVO: Bridging LLaVA and SAM with Visual-only Prompts for Reasoning Segmentation
Kyung-Yoon Yoon
Yeong-Jun Cho
MLLMVLM
358
0
0
10 Nov 2025
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Yiyang Zhou
Haoqin Tu
Z. Wang
Zeyu Wang
Niklas Muennighoff
...
Shen Yan
Haoqi Fan
Cihang Xie
Huaxiu Yao
Qinghao Ye
LRM
218
2
0
04 Nov 2025
UniSOT: A Unified Framework for Multi-Modality Single Object Tracking
UniSOT: A Unified Framework for Multi-Modality Single Object TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yinchao Ma
Yuyang Tang
Wenfei Yang
Tianzhu Zhang
Xu Zhou
Feng Wu
176
0
0
03 Nov 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
458
1
0
31 Oct 2025
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Z. He
MLLMMoEVLM
270
2
0
28 Oct 2025
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan
W. Zhang
Xin Li
Shihao Wang
Kehan Li
Wentong Li
Jun Xiao
Lei Zhang
Beng Chin Ooi
ObjD
310
0
0
27 Oct 2025
FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning
FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning
Lu Zhang
Jiazuo Yu
Haomiao Xiong
Ping Hu
Yunzhi Zhuge
Huchuan Lu
You He
LRM
93
0
0
24 Oct 2025
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
Xiaolong Wang
Lixiang Ru
Ziyuan Huang
Kaixiang Ji
Dandan Zheng
Jingdong Chen
Jun Zhou
VLM
81
0
0
23 Oct 2025
Spatial Preference Rewarding for MLLMs Spatial Understanding
Spatial Preference Rewarding for MLLMs Spatial Understanding
Han Qiu
Peng Gao
Lewei Lu
Xiaoqin Zhang
Ling Shao
Shijian Lu
LRM
96
0
0
16 Oct 2025
Vision-Centric Activation and Coordination for Multimodal Large Language Models
Vision-Centric Activation and Coordination for Multimodal Large Language Models
Yunnan Wang
Fan Lu
Kecheng Zheng
Ziyuan Huang
Ziqiang Li
Wenjun Zeng
Xin Jin
MLLM
308
0
0
16 Oct 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
124
0
0
16 Oct 2025
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
Xinyi Chen
Yilun Chen
Y. Fu
Ning Gao
Jiaya Jia
...
Jinyu Zhang
Shi Zhang
Feng Zheng
Bowen Zhou
Y. Zhu
LM&RoLRM
147
5
0
15 Oct 2025
Detect Anything via Next Point Prediction
Detect Anything via Next Point Prediction
Qing Jiang
Junan Huo
Xingyu Chen
Yuda Xiong
Zhaoyang Zeng
Yihao Chen
Tianhe Ren
Junzhi Yu
Lei Zhang
ObjD
191
11
0
14 Oct 2025
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
Zhenyu Lu
Liupeng Li
Jinpeng Wang
Yan Feng
Bin Chen
Ke Chen
Yaowei Wang
LRM
82
0
0
13 Oct 2025
Unified Open-World Segmentation with Multi-Modal Prompts
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu
Yufei Yin
Chenchen Jing
M. Zhu
Hao Chen
Yuling Xi
Bo Feng
Hao Wang
Shiyu Li
Chunhua Shen
VLM
90
0
0
12 Oct 2025
Vision Language Models: A Survey of 26K Papers
Vision Language Models: A Survey of 26K Papers
Fengming Lin
3DVVLM
107
0
0
10 Oct 2025
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Weikai Huang
Jieyu Zhang
Taoyang Jia
Chenhao Zheng
Ziqi Gao
J. S. Park
Winson Han
Ranjay Krishna
185
0
0
10 Oct 2025
LTCA: Long-range Temporal Context Attention for Referring Video Object Segmentation
LTCA: Long-range Temporal Context Attention for Referring Video Object Segmentation
C. Yan
Jingyun Wang
Guoliang Kang
VOS
173
1
0
09 Oct 2025
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
Ci-Siang Lin
Min-Hung Chen
I-Jieh Liu
Chien-Yi Wang
Sifei Liu
Yu-Chun Wang
VOS
146
0
0
08 Oct 2025
Referring Expression Comprehension for Small Objects
Referring Expression Comprehension for Small Objects
Kanoko Goto
Takumi Hirose
Mahiro Ukai
Shuhei Kurita
Nakamasa Inoue
ObjD
123
1
0
04 Oct 2025
UGround: Towards Unified Visual Grounding with Unrolled Transformers
UGround: Towards Unified Visual Grounding with Unrolled Transformers
Rui Qian
Xin Yin
Chuanhang Deng
Zhiyuan Peng
Jian Xiong
Wei Zhai
Dejing Dou
119
0
0
04 Oct 2025
CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning
CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning
Qihua Dong
Luis Figueroa
Handong Zhao
Kushal Kafle
Jason Kuen
Zhihong Ding
Scott D. Cohen
Y. Fu
ObjDLRM
180
0
0
03 Oct 2025
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Yongyi Su
H. Zhang
Shijie Li
Nanqing Liu
Jingyi Liao
...
Chen Li
Nancy F. Chen
Shuicheng Yan
Xulei Yang
Xun Xu
MLLMVLM
158
3
0
02 Oct 2025
PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset
PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset
Thomas Campagnolo
Ezio Malis
Philippe Martinet
Gaetan Bahl
3DV
80
0
0
01 Oct 2025
VIRTUE: Visual-Interactive Text-Image Universal Embedder
VIRTUE: Visual-Interactive Text-Image Universal Embedder
Wei-Yao Wang
Kazuya Tateishi
Qiyu Wu
Shusuke Takahashi
Yuki Mitsufuji
VLM
107
0
0
01 Oct 2025
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Dengming Zhang
Xiaowen Ma
Zhenliang Ni
Zhenkai Wu
Han Shu
Xin Jiang
Xinghao Chen
MoMe
140
2
0
30 Sep 2025
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
Haotian Xue
Yunhao Ge
Y. Zeng
Zhaoshuo Li
Ming-Yu Liu
Yongxin Chen
JiaoJiao Fan
104
1
0
30 Sep 2025
GroundSight: Augmenting Vision-Language Models with Grounding Information and De-hallucination
GroundSight: Augmenting Vision-Language Models with Grounding Information and De-hallucination
Xinxi Chen
Tianyang Chen
Lijia Hong
HILM
36
0
0
30 Sep 2025
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
Peng Liu
H. Shen
Chunxin Fang
Zhicheng Sun
Jiajia Liao
T. Zhao
MLLMObjDVLMLRM
193
2
0
30 Sep 2025
ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation
ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation
Shilan Zhang
J. Huang
Ruilin Yao
Cong Wang
Yaxiong Chen
Peng Xu
Shengwu Xiong
104
0
0
28 Sep 2025
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
Na Min An
Inha Kang
Minhyun Lee
Hyunjung Shim
VLM
137
0
0
27 Sep 2025
MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning
MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning
Lihao Zheng
Jiawei Chen
Xintian Shen
Hao Ma
Tao Wei
ObjDOffRLVLMLRM
124
0
0
26 Sep 2025
GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions
GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions
Bing Liu
Wenqiang Yv
X. J. Yang
S. Wang
Junzhuo Liu
Peng Wang
G. Wang
Yang Yang
H. Shen
ObjD
139
0
0
25 Sep 2025
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan
Xinhao Li
Yinan He
Zhengrong Yue
Xiangyu Zeng
Yali Wang
Yu Qiao
Limin Wang
Yi Wang
MLLMVLMLRM
189
10
0
25 Sep 2025
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu
Zongyang Ma
Junfu Pu
Zhongang Qi
Yang Wu
Mingyu Ding
Chang Wen Chen
MLLMObjDLRM
311
2
0
22 Sep 2025
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Quanzhu Niu
Dengxian Gong
Shihao Chen
Tao Zhang
Yikang Zhou
Haobo Yuan
Lu Qi
Xiangtai Li
Shilin Xu
VOS
224
0
0
21 Sep 2025
Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization
Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization
Xu Jia
101
0
0
19 Sep 2025
Re-purposing SAM into Efficient Visual Projectors for MLLM-Based Referring Image Segmentation
Re-purposing SAM into Efficient Visual Projectors for MLLM-Based Referring Image Segmentation
Xiaobo Yang
Xiaojin Gong
VLM
95
0
0
17 Sep 2025
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Peng Xu
Shengwu Xiong
Jiajun Zhang
Yaxiong Chen
Bowen Zhou
...
Yang Yang
Yanglin Deng
Yashu Kang
Ye Yuan
Y. Wen
LRM
107
1
0
17 Sep 2025
Mitigating Query Selection Bias in Referring Video Object Segmentation
Mitigating Query Selection Bias in Referring Video Object Segmentation
Dingwei Zhang
Dong Zhang
Jinhui Tang
93
0
0
17 Sep 2025
Improving Generalized Visual Grounding with Instance-aware Joint Learning
Improving Generalized Visual Grounding with Instance-aware Joint LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Ming Dai
Wenxuan Cheng
Jiang-Jiang Liu
Lingfeng Yang
Zhenhua Feng
Wankou Yang
Jingdong Wang
ObjDISeg
207
3
0
17 Sep 2025
1234...171819
Next