Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1511.02283
Cited By
v1
v2
v3 (latest)
Generation and Comprehension of Unambiguous Object Descriptions
7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (164★)
Papers citing
"Generation and Comprehension of Unambiguous Object Descriptions"
50 / 917 papers shown
Title
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
485
1,519
0
24 Aug 2023
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Zichao Dong
Weikun Zhang
Xufeng Huang
Hang Ji
Xin Zhan
Junbo Chen
VLM
87
6
0
24 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
IEEE International Conference on Computer Vision (ICCV), 2023
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
206
22
0
23 Aug 2023
VQA Therapy: Exploring Answer Differences by Visually Grounding Answers
IEEE International Conference on Computer Vision (ICCV), 2023
Chongyan Chen
Samreen Anjum
Danna Gurari
202
15
0
21 Aug 2023
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li
Chi Zhang
Gang Yu
Zhibin Wang
Bin-Bin Fu
Guosheng Lin
Chunhua Shen
Ling Chen
Yunchao Wei
MLLM
148
35
0
20 Aug 2023
Whether you can locate or not? Interactive Referring Expression Generation
ACM Multimedia (ACM MM), 2023
Fulong Ye
Yuxing Long
Fangxiang Feng
Xiaojie Wang
183
8
0
19 Aug 2023
EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Yimin Yan
Xingjian He
Wenxuan Wang
Sihan Chen
Qingbin Liu
ObjD
VLM
258
2
0
18 Aug 2023
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
482
6
0
18 Aug 2023
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
IEEE International Conference on Computer Vision (ICCV), 2023
Henghui Ding
Chang Liu
Shuting He
Xudong Jiang
Chen Change Loy
VOS
259
201
0
16 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Neural Information Processing Systems (NeurIPS), 2023
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
127
21
0
11 Aug 2023
Learning Referring Video Object Segmentation from Weak Annotation
Wangbo Zhao
Ke Nan
Songyang Zhang
Kai-xiang Chen
Dahua Lin
Yang You
VOS
205
6
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
International Conference on Learning Representations (ICLR), 2023
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
219
116
0
03 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
149
32
0
03 Aug 2023
Grounded Image Text Matching with Mismatched Relation Reasoning
IEEE International Conference on Computer Vision (ICCV), 2023
Yu Wu
Yan-Tao Wei
Haozhe Jasper Wang
Yongfei Liu
Sibei Yang
Xuming He
215
12
0
02 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
Computer Vision and Pattern Recognition (CVPR), 2023
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&Ro
VLM
MLLM
LRM
449
704
0
01 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yuhao Lu
Yixuan Fan
Beixing Deng
Fan Liu
Yali Li
Shengjin Wang
238
55
0
01 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
408
150
0
25 Jul 2023
Spectrum-guided Multi-granularity Referring Video Object Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Bo Miao
Bennamoun
Yongsheng Gao
Lin Wang
VOS
217
62
0
25 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible Expressions
Neural Information Processing Systems (NeurIPS), 2023
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
218
47
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
247
6
0
23 Jul 2023
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Computer Vision and Pattern Recognition (CVPR), 2023
Zhihong Chen
Ruifei Zhang
Yibing Song
Xiang Wan
Guanbin Li
165
29
0
21 Jul 2023
Divert More Attention to Vision-Language Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
233
13
0
19 Jul 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
188
7
0
17 Jul 2023
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Junghyun Kim
Gi-Cheon Kang
Suhyung Choi
Suyeon Shin
Byoung-Tak Zhang
LM&Ro
182
9
0
12 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
793
311
0
07 Jul 2023
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation
Neural Networks (Neural Netw.), 2023
Yonglin Li
Jing Zhang
Xiao Teng
Long Lan
VOS
VLM
190
24
0
03 Jul 2023
Bidirectional Correlation-Driven Inter-Frame Interaction Transformer for Referring Video Object Segmentation
Pattern Recognition (Pattern Recogn.), 2023
Meng Lan
Fu Rong
Zuchao Li
Wei Yu
Guang Dai
VOS
327
11
0
02 Jul 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
418
801
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
International Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
337
1,006
0
26 Jun 2023
Mutual Query Network for Multi-Modal Product Image Segmentation
IEEE International Conference on Multimedia and Expo (ICME), 2023
Xu Tan
Wei Feng
Zheng Zhang
Xiancong Ren
Yaoyu Li
Jing Lv
Xinshuai Zhu
Zhangang Lin
Jingping Shao
95
0
0
26 Jun 2023
A Survey on Multimodal Large Language Models
National Science Review (NSR), 2023
Xinglong Mao
Chaoyou Fu
Zhengye Zhang
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
405
953
0
23 Jun 2023
WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Ze-Long Cheng
Peng Jin
Hao Li
Kehan Li
Siheng Li
Xiang Ji
Chang-rui Liu
Jie Chen
141
7
0
19 Jun 2023
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Linfeng Yuan
Miaojing Shi
Zijie Yue
Qijun Chen
VOS
187
22
0
14 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziqiao Ma
Jiayi Pan
J. Chai
ObjD
VLM
175
12
0
14 Jun 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
235
19
0
14 Jun 2023
Referring to Screen Texts with Voice Assistants
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shruti Bhargava
Anand Dhoot
I. Jonsson
Hoang Long Nguyen
Alkesh Patel
Hong-ye Yu
Vincent Renkens
161
2
0
10 Jun 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
196
4
0
07 Jun 2023
MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding
Oceans (OCEANS), 2023
Tan-Sang Ha
Hai Nguyen-Truong
Tuan-Anh Vu
Sai-Kit Yeung
122
0
0
07 Jun 2023
Fine-Grained Visual Prompting
Neural Information Processing Systems (NeurIPS), 2023
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
197
96
0
07 Jun 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
252
48
0
06 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
AAAI Conference on Artificial Intelligence (AAAI), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
137
29
0
06 Jun 2023
GRES: Generalized Referring Expression Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Chang Liu
Henghui Ding
Xudong Jiang
329
236
0
01 Jun 2023
Speaking the Language of Your Listener: Audience-Aware Adaptation via Plug-and-Play Theory of Mind
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ece Takmaz
Nicolo' Brandizzi
Mario Giulianelli
Sandro Pezzelle
Raquel Fernández
193
7
0
31 May 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
British Machine Vision Conference (BMVC), 2023
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
215
8
0
30 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Guoying Gu
Yu Qiao
Hao Dong
Zhongjiang He
Shiyang Feng
VOS
266
55
0
25 May 2023
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
IEEE Transactions on Image Processing (IEEE TIP), 2023
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
267
63
0
24 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
P. Sadler
David Schlangen
128
3
0
24 May 2023
MMNet: Multi-Mask Network for Referring Image Segmentation
Yimin Yan
Xingjian He
Wenxuan Wan
Qingbin Liu
EgoV
214
2
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
186
9
0
24 May 2023
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
International Conference on 3D Vision (3DV), 2023
Taiki Miyanishi
Daich Azuma
Shuhei Kurita
M. Kawanabe
218
10
0
23 May 2023
Previous
1
2
3
...
9
10
11
...
17
18
19
Next