Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1511.02283
Cited By
v1
v2
v3 (latest)
Generation and Comprehension of Unambiguous Object Descriptions
7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (164★)
Papers citing
"Generation and Comprehension of Unambiguous Object Descriptions"
50 / 917 papers shown
Title
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
192
9
0
15 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Shiyang Feng
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Jiaming Song
Yu Qiao
MLLM
VLM
272
271
0
13 Nov 2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan
Zhengyuan Yang
Wanrong Zhu
Kevin Qinghong Lin
Linjie Li
...
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
LLMAG
LM&Ro
317
138
0
13 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Computer Vision and Pattern Recognition (CVPR), 2023
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
175
55
0
11 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Computer Vision and Pattern Recognition (CVPR), 2023
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
313
357
0
10 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Conference on Robot Learning (CoRL), 2023
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
195
39
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
372
24
0
09 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
299
73
0
08 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
156
10
0
07 Nov 2023
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
International Conference on Learning Representations (ICLR), 2023
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
MLLM
327
18
0
06 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
Neural Information Processing Systems (NeurIPS), 2023
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
599
698
0
06 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
227
2
0
05 Nov 2023
Towards Omni-supervised Referring Expression Segmentation
IEEE International Conference on Multimedia and Expo (ICME), 2023
Minglang Huang
Weihao Ye
Gen Luo
Guannan Jiang
Weilin Zhuang
Xiaoshuai Sun
282
1
0
01 Nov 2023
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data
Neural Information Processing Systems (NeurIPS), 2023
Taiki Miyanishi
Fumiya Kitamori
Shuhei Kurita
Jungdae Lee
M. Kawanabe
Nakamasa Inoue
AI4TS
3DPC
190
14
0
28 Oct 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
Neural Information Processing Systems (NeurIPS), 2023
Mengxue Qu
Yu-Huan Wu
Wu Liu
Xiaodan Liang
Jingkuan Song
Yao-Min Zhao
Yunchao Wei
175
19
0
26 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjD
VLM
228
14
0
22 Oct 2023
LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly
Bowen Fu
Sek Kun Leong
Yan Di
Jiwen Tang
Xiangyang Ji
269
5
0
20 Oct 2023
Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation
Francisco Eiras
Kemal Oksuz
Adel Bibi
Juil Sock
P. Dokania
260
2
0
20 Oct 2023
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Junghyun Kim
Gi-Cheon Kang
Suhyung Choi
Seoyun Yang
Minjoon Jung
Byoung-Tak Zhang
188
0
0
19 Oct 2023
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Haowei Wang
Jiayi Ji
Tianyu Guo
Yilong Yang
Weihao Ye
Xiaoshuai Sun
Rongrong Ji
284
8
0
17 Oct 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
1.1K
608
0
14 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLM
VLM
296
66
0
13 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
International Conference on Learning Representations (ICLR), 2023
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjD
MLLM
VLM
399
448
0
11 Oct 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
IEEE International Conference on Computer Vision (ICCV), 2023
Chengyang Zhao
Songlin Yang
Zhenfang Chen
Mingyu Ding
Chuang Gan
341
23
0
10 Oct 2023
InstructDET: Diversifying Referring Object Detection with Generalized Instructions
International Conference on Learning Representations (ICLR), 2023
Ronghao Dang
Jiangyan Feng
Haodong Zhang
Chongjian Ge
Lin Song
...
Chengju Liu
Qi Chen
Feng Zhu
Rui Zhao
Yibing Song
ObjD
395
16
0
08 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
429
10
0
08 Oct 2023
Improved Baselines with Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
568
4,047
0
05 Oct 2023
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Computer Vision and Pattern Recognition (CVPR), 2023
Shiyu Xuan
Qingpei Guo
Ming Yang
Shiliang Zhang
MLLM
ObjD
205
51
0
01 Oct 2023
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Wei Ji
Li Li
Marco Pleines
Xiangyan Liu
Xu Yang
Juncheng Billy Li
Roger Zimmermann
154
12
0
29 Sep 2023
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Conference on Robot Learning (CoRL), 2023
Haonan Chang
Kowndinya Boyalakuntla
Shiyang Lu
Siwei Cai
E. Jing
...
Shijie Geng
Adeeb Abbas
Lifeng Zhou
Kostas Bekris
Abdeslam Boularias
198
35
0
27 Sep 2023
Multi-modal Domain Adaptation for REG via Relation Transfer
Yifan Ding
Liqiang Wang
Boqing Gong
175
0
0
23 Sep 2023
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
International Journal of Computer Vision (IJCV), 2023
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffM
VLM
264
45
0
22 Sep 2023
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
IEEE International Conference on Computer Vision (ICCV), 2023
Yiming Zhang
ZeMing Gong
Angel X. Chang
354
129
0
11 Sep 2023
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
European Conference on Computer Vision (ECCV), 2023
Ozan Unal
Daniel Gehrig
Suman Saha
Luc Van Gool
221
27
0
08 Sep 2023
Tracking Anything with Decoupled Video Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Ho Kei Cheng
Seoung Wug Oh
Brian L. Price
Alexander Schwing
Joon-Young Lee
VOS
VLM
247
197
0
07 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Computer Vision and Pattern Recognition (CVPR), 2023
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffM
VLM
249
155
0
07 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
IEEE International Conference on Computer Vision (ICCV), 2023
Clarence Lee
M Ganesh Kumar
Cheston Tan
167
3
0
07 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
267
2
0
06 Sep 2023
Dense Object Grounding in 3D Scenes
ACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
218
24
0
05 Sep 2023
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xuyang Liu
Siteng Huang
Yachen Kang
Honggang Chen
Donglin Wang
ObjD
416
18
0
03 Sep 2023
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
IEEE International Conference on Computer Vision (ICCV), 2023
Jiajin Tang
Ge Zheng
Jingyi Yu
Sibei Yang
ObjD
179
39
0
03 Sep 2023
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding
European Conference on Computer Vision (ECCV), 2023
Cheng Shi
Sibei Yang
LRM
150
12
0
03 Sep 2023
Contrastive Grouping with Transformer for Referring Image Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Jiajin Tang
Ge Zheng
Cheng Shi
Sibei Yang
ViT
254
57
0
02 Sep 2023
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Minheng Ni
Yabo Zhang
Kailai Feng
Xiaoming Li
Yiwen Guo
W. Zuo
DiffM
289
33
0
31 Aug 2023
GREC: Generalized Referring Expression Comprehension
Shuting He
Henghui Ding
Chang Liu
Xudong Jiang
ObjD
216
34
0
30 Aug 2023
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
IEEE International Conference on Computer Vision (ICCV), 2023
Dongwon Kim
Nam-Won Kim
Cuiling Lan
Suha Kwak
VLM
255
26
0
29 Aug 2023
Referring Image Segmentation Using Text Supervision
IEEE International Conference on Computer Vision (ICCV), 2023
Fang Liu
Yuhao Liu
Yuqiu Kong
Ke Xu
Lulu Zhang
Baocai Yin
Gerhard Hancke
Rynson W. H. Lau
230
46
0
28 Aug 2023
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Computer Vision and Pattern Recognition (CVPR), 2023
Haiwen Diao
Bo Wan
Yanzhe Zhang
Xuecong Jia
Huchuan Lu
Long Chen
VLM
185
25
0
28 Aug 2023
Towards Unified Token Learning for Vision-Language Tracking
Yaozong Zheng
Bineng Zhong
Qihua Liang
Guorong Li
Rongrong Ji
Xianxian Li
247
78
0
27 Aug 2023
Beyond One-to-One: Rethinking the Referring Image Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Yutao Hu
Qixiong Wang
Wenqi Shao
Enze Xie
Zhenguo Li
Jungong Han
Ping Luo
3DV
219
65
0
26 Aug 2023
Previous
1
2
3
...
8
9
10
...
17
18
19
Next