ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02283
  4. Cited By
Generation and Comprehension of Unambiguous Object Descriptions
v1v2v3 (latest)

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
    ObjD
ArXiv (abs)PDFHTMLGithub (164★)

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 917 papers shown
Title
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
485
1,519
0
24 Aug 2023
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt
  interaction tasks
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Zichao Dong
Weikun Zhang
Xufeng Huang
Hang Ji
Xin Zhan
Junbo Chen
VLM
87
6
0
24 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person
  Perception of Ego4D
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DIEEE International Conference on Computer Vision (ICCV), 2023
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
206
22
0
23 Aug 2023
VQA Therapy: Exploring Answer Differences by Visually Grounding Answers
VQA Therapy: Exploring Answer Differences by Visually Grounding AnswersIEEE International Conference on Computer Vision (ICCV), 2023
Chongyan Chen
Samreen Anjum
Danna Gurari
202
15
0
21 Aug 2023
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
  Image-Dialogue Data
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li
Chi Zhang
Gang Yu
Zhibin Wang
Bin-Bin Fu
Guosheng Lin
Chunhua Shen
Ling Chen
Yunchao Wei
MLLM
148
35
0
20 Aug 2023
Whether you can locate or not? Interactive Referring Expression
  Generation
Whether you can locate or not? Interactive Referring Expression GenerationACM Multimedia (ACM MM), 2023
Fulong Ye
Yuxing Long
Fangxiang Feng
Xiaojie Wang
183
8
0
19 Aug 2023
EAVL: Explicitly Align Vision and Language for Referring Image
  Segmentation
EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Yimin Yan
Xingjian He
Wenxuan Wang
Sihan Chen
Qingbin Liu
ObjDVLM
258
2
0
18 Aug 2023
Language-Guided Diffusion Model for Visual Grounding
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
482
6
0
18 Aug 2023
MeViS: A Large-scale Benchmark for Video Segmentation with Motion
  Expressions
MeViS: A Large-scale Benchmark for Video Segmentation with Motion ExpressionsIEEE International Conference on Computer Vision (ICCV), 2023
Henghui Ding
Chang Liu
Shuting He
Xudong Jiang
Chen Change Loy
VOS
259
201
0
16 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Foundation Model is Efficient Multimodal Multitask Model SelectorNeural Information Processing Systems (NeurIPS), 2023
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
127
21
0
11 Aug 2023
Learning Referring Video Object Segmentation from Weak Annotation
Learning Referring Video Object Segmentation from Weak Annotation
Wangbo Zhao
Ke Nan
Songyang Zhang
Kai-xiang Chen
Dahua Lin
Yang You
VOS
205
6
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and
  Understanding of the Open World
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open WorldInternational Conference on Learning Representations (ICLR), 2023
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRMMLLM
219
116
0
03 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic
  and Regional Comprehension
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
149
32
0
03 Aug 2023
Grounded Image Text Matching with Mismatched Relation Reasoning
Grounded Image Text Matching with Mismatched Relation ReasoningIEEE International Conference on Computer Vision (ICCV), 2023
Yu Wu
Yan-Tao Wei
Haozhe Jasper Wang
Yongfei Liu
Sibei Yang
Xuming He
215
12
0
02 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
LISA: Reasoning Segmentation via Large Language ModelComputer Vision and Pattern Recognition (CVPR), 2023
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&RoVLMMLLMLRM
449
704
0
01 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor ScenesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yuhao Lu
Yixuan Fan
Beixing Deng
Fan Liu
Yali Li
Shengjin Wang
238
55
0
01 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
408
150
0
25 Jul 2023
Spectrum-guided Multi-granularity Referring Video Object Segmentation
Spectrum-guided Multi-granularity Referring Video Object SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Bo Miao
Bennamoun
Yongsheng Gao
Lin Wang
VOS
217
62
0
25 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible
  Expressions
Described Object Detection: Liberating Object Detection with Flexible ExpressionsNeural Information Processing Systems (NeurIPS), 2023
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
218
47
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based
  Centerpoint Supervision
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
247
6
0
23 Jul 2023
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Advancing Visual Grounding with Scene Knowledge: Benchmark and MethodComputer Vision and Pattern Recognition (CVPR), 2023
Zhihong Chen
Ruifei Zhang
Yibing Song
Xiang Wan
Guanbin Li
165
29
0
21 Jul 2023
Divert More Attention to Vision-Language Object Tracking
Divert More Attention to Vision-Language Object TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
233
13
0
19 Jul 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from
  Manipulation Instructions
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation InstructionsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
188
7
0
17 Jul 2023
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic
  Manipulation
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic ManipulationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Junghyun Kim
Gi-Cheon Kang
Suhyung Choi
Suyeon Shin
Byoung-Tak Zhang
LM&Ro
182
9
0
12 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
793
311
0
07 Jul 2023
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring
  Video Object Segmentation
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object SegmentationNeural Networks (Neural Netw.), 2023
Yonglin Li
Jing Zhang
Xiao Teng
Long Lan
VOSVLM
190
24
0
03 Jul 2023
Bidirectional Correlation-Driven Inter-Frame Interaction Transformer for
  Referring Video Object Segmentation
Bidirectional Correlation-Driven Inter-Frame Interaction Transformer for Referring Video Object SegmentationPattern Recognition (Pattern Recogn.), 2023
Meng Lan
Fu Rong
Zuchao Li
Wei Yu
Guang Dai
VOS
327
11
0
02 Jul 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
418
801
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the WorldInternational Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
337
1,006
0
26 Jun 2023
Mutual Query Network for Multi-Modal Product Image Segmentation
Mutual Query Network for Multi-Modal Product Image SegmentationIEEE International Conference on Multimedia and Expo (ICME), 2023
Xu Tan
Wei Feng
Zheng Zhang
Xiancong Ren
Yaoyu Li
Jing Lv
Xinshuai Zhu
Zhangang Lin
Jingping Shao
95
0
0
26 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language ModelsNational Science Review (NSR), 2023
Xinglong Mao
Chaoyou Fu
Zhengye Zhang
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
405
953
0
23 Jun 2023
WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image
  Segmentation
WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image SegmentationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Ze-Long Cheng
Peng Jin
Hao Li
Kehan Li
Siheng Li
Xiang Ji
Chang-rui Liu
Jie Chen
141
7
0
19 Jun 2023
LoSh: Long-Short Text Joint Prediction Network for Referring Video
  Object Segmentation
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object SegmentationComputer Vision and Pattern Recognition (CVPR), 2023
Linfeng Yuan
Miaojing Shi
Zijie Yue
Qijun Chen
VOS
187
22
0
14 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast
  Mapping in Vision-Language Models
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziqiao Ma
Jiayi Pan
J. Chai
ObjDVLM
175
12
0
14 Jun 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Extending CLIP's Image-Text Alignment to Referring Image SegmentationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
235
19
0
14 Jun 2023
Referring to Screen Texts with Voice Assistants
Referring to Screen Texts with Voice AssistantsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shruti Bhargava
Anand Dhoot
I. Jonsson
Hoang Long Nguyen
Alkesh Patel
Hong-ye Yu
Vincent Renkens
161
2
0
10 Jun 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot
  Vision-Language Tasks
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
196
4
0
07 Jun 2023
MarineVRS: Marine Video Retrieval System with Explainability via
  Semantic Understanding
MarineVRS: Marine Video Retrieval System with Explainability via Semantic UnderstandingOceans (OCEANS), 2023
Tan-Sang Ha
Hai Nguyen-Truong
Tuan-Anh Vu
Sai-Kit Yeung
122
0
0
07 Jun 2023
Fine-Grained Visual Prompting
Fine-Grained Visual PromptingNeural Information Processing Systems (NeurIPS), 2023
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjDVLM
197
96
0
07 Jun 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Language Adaptive Weight Generation for Multi-task Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
252
48
0
06 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
Referring Expression Comprehension Using Language Adaptive InferenceAAAI Conference on Artificial Intelligence (AAAI), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
137
29
0
06 Jun 2023
GRES: Generalized Referring Expression Segmentation
GRES: Generalized Referring Expression SegmentationComputer Vision and Pattern Recognition (CVPR), 2023
Chang Liu
Henghui Ding
Xudong Jiang
329
236
0
01 Jun 2023
Speaking the Language of Your Listener: Audience-Aware Adaptation via
  Plug-and-Play Theory of Mind
Speaking the Language of Your Listener: Audience-Aware Adaptation via Plug-and-Play Theory of MindAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ece Takmaz
Nicolo' Brandizzi
Mario Giulianelli
Sandro Pezzelle
Raquel Fernández
193
7
0
31 May 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
DisCLIP: Open-Vocabulary Referring Expression GenerationBritish Machine Vision Conference (BMVC), 2023
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
215
8
0
30 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Guoying Gu
Yu Qiao
Hao Dong
Zhongjiang He
Shiyang Feng
VOS
266
55
0
25 May 2023
Multi-Modal Mutual Attention and Iterative Interaction for Referring
  Image Segmentation
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image SegmentationIEEE Transactions on Image Processing (IEEE TIP), 2023
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
267
63
0
24 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental
  Algorithm for Referring Expression Generation from Examples
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from ExamplesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
P. Sadler
David Schlangen
128
3
0
24 May 2023
MMNet: Multi-Mask Network for Referring Image Segmentation
MMNet: Multi-Mask Network for Referring Image Segmentation
Yimin Yan
Xingjian He
Wenxuan Wan
Qingbin Liu
EgoV
214
2
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjDVLM
186
9
0
24 May 2023
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D ScansInternational Conference on 3D Vision (3DV), 2023
Taiki Miyanishi
Daich Azuma
Shuhei Kurita
M. Kawanabe
218
10
0
23 May 2023
Previous
123...91011...171819
Next