ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02283
  4. Cited By
Generation and Comprehension of Unambiguous Object Descriptions
v1v2v3 (latest)

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
    ObjD
ArXiv (abs)PDFHTMLGithub (164★)

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 917 papers shown
Title
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
InstructPart: Task-Oriented Part Segmentation with Instruction ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zifu Wan
Yaqi Xie
Ce Zhang
Zhiqiu Lin
Zihan Wang
Simon Stepputtis
Deva Ramanan
Katia Sycara
163
3
0
23 May 2025
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
Zhihua Liu
Amrutha Saseendran
Lei Tong
Xilin He
Fariba Yousefi
...
Dino Oglic
Tom Diethe
Philip Teare
Huiyu Zhou
Chen Jin
VLM
587
3
0
23 May 2025
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Jiachen Jiang
Jinxin Zhou
Bo Peng
Xia Ning
Zhihui Zhu
256
1
0
22 May 2025
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Ground-V: Teaching VLMs to Ground Complex Instructions in PixelsComputer Vision and Pattern Recognition (CVPR), 2025
Yongshuo Zong
Qin Zhang
Dongsheng An
Zhihua Li
Xiang Xu
Linghan Xu
Zhuowen Tu
Yifan Xing
Onkar Dabeer
ObjD
257
3
0
20 May 2025
Advancing Sequential Numerical Prediction in Autoregressive Models
Advancing Sequential Numerical Prediction in Autoregressive ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiang Fei
Jinghui Lu
Qi Sun
Hao Feng
Yanjie Wang
Wei Shi
An-Lan Wang
Jingqun Tang
Can Huang
AI4TS
515
5
0
19 May 2025
Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding
Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding
Xuefei Sun
Doncey Albin
Cecilia Mauceri
Dusty Woods
Christoffer Heckman
LRM
173
1
0
18 May 2025
Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
Lucas Choi
Ross Greer
VLM
355
1
0
14 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Bias and Generalizability of Foundation Models across Datasets in Breast MammographyInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Elodie Germani
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Shadi Albarqouni
AI4CE
296
3
0
14 May 2025
SITE: towards Spatial Intelligence Thorough Evaluation
SITE: towards Spatial Intelligence Thorough Evaluation
Wenjie Wang
Reuben Tan
Pengyue Zhu
Jianwei Yang
Zhengyuan Yang
Lijuan Wang
Andrey Kolobov
Jianfeng Gao
Boqing Gong
233
6
0
08 May 2025
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
Huajie Tan
Xiaoshuai Hao
Cheng Chi
Minglan Lin
Yaoxu Lyu
...
Yulong Ao
Yonghua Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
LM&Ro
334
7
0
06 May 2025
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen
Runmin Cong
Yuzhi Zhao
Hongzheng Yang
Guangneng Hu
H. Ip
Sam Kwong
CLLKELM
306
7
0
05 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
241
2
0
03 May 2025
Multimodal Language Models See Better When They Look Shallower
Multimodal Language Models See Better When They Look Shallower
Huajun Chen
Junyan Lin
Xinhao Chen
Yue Fan
Jianfeng Dong
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
318
4
0
30 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
436
3
0
22 Apr 2025
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
Jingchao Wang
Hong Wang
Wenlong Zhang
Kunhua Ji
Dingjiang Huang
Yefeng Zheng
ObjD
326
2
0
22 Apr 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Weiye Xu
Jun Wang
Weiyun Wang
Zhe Chen
Wengang Zhou
...
Xiaohua Wang
Xizhou Zhu
Wenhai Wang
Jifeng Dai
Jinguo Zhu
VLMLRM
380
39
0
21 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image SegmentationPattern Recognition (Pattern Recogn.), 2025
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
424
0
0
20 Apr 2025
Visual Intention Grounding for Egocentric Assistants
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun
Junbin Xiao
Tze Ho Elden Tse
Yicong Li
Arjun Akula
Angela Yao
EgoV
227
1
0
18 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
509
738
1
14 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLMOffRLLRM
267
52
0
10 Apr 2025
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu
Qizhi Chen
Zechuan Wang
Yiwen Tang
Yiting Zhang
Chi Yan
Dong Wang
Xiaochen Li
Jiangwei Zhong
CoGe
449
5
0
10 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Window Token Concatenation for Efficient Visual Large Language Models
Jiayi Zhang
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
224
1
0
05 Apr 2025
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image SegmentationComputer Vision and Pattern Recognition (CVPR), 2025
Ting Liu
Siyuan Li
229
7
0
01 Apr 2025
InstructRestore: Region-Customized Image Restoration with Human Instructions
InstructRestore: Region-Customized Image Restoration with Human Instructions
Shixuan Liu
Jianqi Ma
Lingchen Sun
Xiangtao Kong
Lei Zhang
DiffM
201
1
0
31 Mar 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
Tianming Liang
Haichao Jiang
Wei-Shi Zheng
Jian-Fang Hu
255
1
0
30 Mar 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-AnalysisComputer Vision and Pattern Recognition (CVPR), 2025
J. Huang
Baoxiong Jia
Longji Xu
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
337
17
0
28 Mar 2025
Qwen2.5-Omni Technical Report
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGenAuLLM
800
320
0
26 Mar 2025
Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation
Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation
Niccolo Avogaro
Thomas Frick
Mattia Rigotti
Andrea Bartezzaghi
Filip M. Janicki
Cristiano Malossi
Konrad Schindler
Roy Assaf
MLLMVLM
224
2
0
25 Mar 2025
Visual Position Prompt for MLLM based Visual Grounding
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
449
7
0
19 Mar 2025
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning SegmentationInternational Conference on Learning Representations (ICLR), 2025
Donggon Jang
Yucheol Cho
Suin Lee
Taehyeon Kim
Dae-Shik Kim
VLM
211
17
0
18 Mar 2025
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Haiyang Guo
Fanhu Zeng
Ziwei Xiang
Fei Zhu
Da-Han Wang
Xu-Yao Zhang
Cheng-Lin Liu
353
10
0
17 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
Chong Chen
Zonghao Guo
Yang Li
Xiaoyi Feng
Maosong Sun
VLMLRM
255
12
0
17 Mar 2025
Federated Continual Instruction Tuning
Federated Continual Instruction Tuning
Haiyang Guo
Fanhu Zeng
Fei Zhu
Wenzhuo Liu
Da-Han Wang
Jian Xu
Xu-Yao Zhang
Cheng-Lin Liu
CLLFedML
483
6
0
17 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
277
8
0
17 Mar 2025
Grounded Chain-of-Thought for Multimodal Large Language Models
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Weihao Ye
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
409
23
0
17 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
440
4
0
13 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
855
12
0
11 Mar 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesComputer Vision and Pattern Recognition (CVPR), 2025
Huanyi Zheng
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Yongxu Liu
M. Yang
Chunhua Shen
MLLMVLM
229
9
0
11 Mar 2025
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
Huilin Deng
Ding Zou
Rui Ma
Hongchen Luo
Yang Cao
Yu Kang
LRMVLM
256
47
0
10 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRLLRM
385
21
0
10 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
260
29
0
08 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best PracticesComputer Vision and Pattern Recognition (CVPR), 2025
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Jianfeng Dong
Hui Su
Jinlan Fu
Xiaoyu Shen
206
10
0
08 Mar 2025
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
Suhwan Cho
Seunghoon Lee
Minhyeok Lee
Jungho Lee
Sangyoun Lee
VOS
408
3
0
05 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
298
5
0
04 Mar 2025
Teaching Metric Distance to Discrete Autoregressive Language Models
Teaching Metric Distance to Discrete Autoregressive Language Models
Jiwan Chung
Saejin Kim
Yongrae Jo
Jinho Park
Dongjun Min
Youngjae Yu
501
0
0
04 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Nianzu Yang
Yun Zheng
Liwei Wang
ObjDVLM
365
12
0
03 Mar 2025
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word EmphasisAAAI Conference on Artificial Intelligence (AAAI), 2025
Yun Wang
Jingchen Ni
Yong-Jin Liu
Chun Yuan
Yansong Tang
251
13
0
02 Mar 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
X. J. Yang
Jing Liu
Peng Wang
Guoqing Wang
Yue Yang
Jikang Cheng
ObjD
426
2
0
27 Feb 2025
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
525
7
0
24 Feb 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
356
4
0
24 Feb 2025
Previous
123456...171819
Next