Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1511.02283
Cited By
v1
v2
v3 (latest)
Generation and Comprehension of Unambiguous Object Descriptions
7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (164★)
Papers citing
"Generation and Comprehension of Unambiguous Object Descriptions"
50 / 917 papers shown
Title
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
224
3
0
18 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Computer Vision and Pattern Recognition (CVPR), 2024
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
388
542
0
11 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
553
90
0
11 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
265
144
0
10 Jan 2024
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He
Longhui Wei
Lingxi Xie
Qi Tian
280
12
0
06 Jan 2024
Generating Enhanced Negatives for Training Language-Based Object Detectors
Computer Vision and Pattern Recognition (CVPR), 2023
Shiyu Zhao
Long Zhao
Vijay Kumar B.G
Yumin Suh
Dimitris N. Metaxas
Manmohan Chandraker
S. Schulter
ObjD
VLM
380
13
0
29 Dec 2023
Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Jiaxi Wang
Wenhui Hu
Xueyang Liu
Beihu Wu
Yuting Qiu
Yingying Cai
243
1
0
29 Dec 2023
Tracking with Human-Intent Reasoning
Jiawen Zhu
Zhi-Qi Cheng
Jun-Yan He
Chenyang Li
Bin Luo
Huchuan Lu
Yifeng Geng
Xuansong Xie
LRM
VOS
178
23
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
259
264
0
28 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
174
30
0
27 Dec 2023
Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine
Kanta Kaneda
Shunya Nagashima
Ryosuke Korekata
Motonari Kambara
Komei Sugiura
231
8
0
26 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
261
25
0
25 Dec 2023
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
215
13
0
23 Dec 2023
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen
Tiancheng Zhao
Mingwei Zhu
Yuxiang Cai
VLM
ObjD
396
25
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
578
2,127
0
21 Dec 2023
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Qinying Liu
Wei Wu
Kecheng Zheng
Zhan Tong
Jiawei Liu
Yu Liu
Wei Chen
Zilei Wang
Yujun Shen
VLM
305
7
0
21 Dec 2023
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
363
309
0
21 Dec 2023
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
342
404
0
20 Dec 2023
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
354
38
0
19 Dec 2023
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
228
41
0
19 Dec 2023
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan Liu
Yiwei Ma
Xiaoqing Zhang
Haowei Wang
Jiayi Ji
Xiaoshuai Sun
Rongrong Ji
367
82
0
19 Dec 2023
GSVA: Generalized Segmentation via Multimodal Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Zhuofan Xia
Dongchen Han
Yizeng Han
Xuran Pan
Shiji Song
Gao Huang
VLM
575
122
0
15 Dec 2023
Osprey: Pixel Understanding with Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Yuqian Yuan
Wentong Li
Jian Liu
Dongqi Tang
Xinjie Luo
Chi Qin
Lei Zhang
Jianke Zhu
MLLM
VLM
462
140
0
15 Dec 2023
Pixel Aligned Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
243
17
0
14 Dec 2023
Tokenize Anything via Prompting
European Conference on Computer Vision (ECCV), 2023
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
211
35
0
14 Dec 2023
See, Say, and Segment: Teaching LMMs to Overcome False Premises
Computer Vision and Pattern Recognition (CVPR), 2023
Tsung-Han Wu
Giscard Biamby
David M. Chan
Lisa Dunlap
Ritwik Gupta
Xudong Wang
Joseph E. Gonzalez
Trevor Darrell
VLM
MLLM
287
33
0
13 Dec 2023
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Computer Vision and Pattern Recognition (CVPR), 2023
Shuyang Sun
Runjia Li
Juil Sock
Xiuye Gu
Siyang Li
VLM
CLIP
415
54
0
12 Dec 2023
VILA: On Pre-training for Visual Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Ji Lin
Hongxu Yin
Ming-Yu Liu
Yao Lu
Pavlo Molchanov
Andrew Tao
Huizi Mao
Jan Kautz
Mohammad Shoeybi
Song Han
MLLM
VLM
575
657
0
12 Dec 2023
Interfacing Foundation Models' Embeddings
Neural Information Processing Systems (NeurIPS), 2023
Xueyan Zou
Linjie Li
Jianfeng Wang
Jianwei Yang
Mingyu Ding
...
Hao Zhang
Shilong Liu
Arul Aravinthan
Yong Jae Lee
Lijuan Wang
50
1
0
12 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
333
193
0
11 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
374
11
0
11 Dec 2023
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods
Panos Achlioptas
Alexandros Benetatos
Iordanis Fostiropoulos
Dimitris Skourtis
245
10
0
11 Dec 2023
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Jianhua Wu
B. Gao
Jincheng Gao
Jianhao Yu
Hongqing Chu
...
Xun Gong
Yi Chang
H. E. Tseng
Hong Chen
Jie Chen
305
17
0
08 Dec 2023
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Junyu Lu
Ruyi Gan
Di Zhang
Xiaojun Wu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
MLLM
VLM
188
21
0
08 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Neural Information Processing Systems (NeurIPS), 2023
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
240
13
0
08 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Yuan Liu
VLM
CLIP
335
161
0
06 Dec 2023
Lenna: Language Enhanced Reasoning Detection Assistant
Fei Wei
Xinyu Zhang
Ailing Zhang
Bo Zhang
Xiangxiang Chu
MLLM
LRM
242
30
0
05 Dec 2023
Aligning and Prompting Everything All at Once for Universal Visual Perception
Computer Vision and Pattern Recognition (CVPR), 2023
Chunjiang Ge
Chaoyou Fu
Peixian Chen
Mengdan Zhang
Ke Li
Xing Sun
Yunsheng Wu
Shaohui Lin
Rongrong Ji
VLM
ObjD
263
63
0
04 Dec 2023
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection
Sunghun Kang
Junbum Cha
Jonghwan Mun
Byungseok Roh
Chang D. Yoo
VLM
ObjD
171
2
0
04 Dec 2023
Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence
International Conference on Information Photonics (ICIP), 2023
Yajie Liu
Pu Ge
Haoxiang Ma
Shichao Fan
Qingjie Liu
Di Huang
Yunhong Wang
155
1
0
01 Dec 2023
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Lorenzo Bianchi
F. Carrara
Nicola Messina
Claudio Gennaro
Fabrizio Falchi
ObjD
309
24
0
29 Nov 2023
Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation
Minhyeok Lee
Dogyoon Lee
Jungho Lee
Suhwan Cho
Heeseung Choi
Ig-Jae Kim
Sangyoun Lee
155
0
0
29 Nov 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
147
4
0
29 Nov 2023
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Computer Vision and Pattern Recognition (CVPR), 2023
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
386
19
0
28 Nov 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
European Conference on Computer Vision (ECCV), 2023
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
283
467
0
28 Nov 2023
RISAM: Referring Image Segmentation via Mutual-Aware Attention Features
Mengxi Zhang
Yiming Liu
Xiangjun Yin
Huanjing Yue
Jingyu Yang
356
1
0
27 Nov 2023
Continual Referring Expression Comprehension via Dual Modular Memorization
IEEE Transactions on Image Processing (IEEE TIP), 2022
Hengtao Shen
Cheng Chen
Peng Wang
Lianli Gao
Ming Wang
Jingkuan Song
ObjD
156
5
0
25 Nov 2023
Text and Click inputs for unambiguous open vocabulary instance segmentation
British Machine Vision Conference (BMVC), 2023
Nikolai Warner
Meera Hahn
Jonathan Huang
Irfan Essa
Vighnesh Birodkar
VLM
191
0
0
24 Nov 2023
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models
Xiaoyu Yang
Lijian Xu
Hao Sun
Jiaming Song
Shaoting Zhang
ObjD
393
10
0
21 Nov 2023
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen
Leyang Shen
Rui Shao
Xiang Deng
Liqiang Nie
VLM
MLLM
271
79
0
20 Nov 2023
Previous
1
2
3
...
7
8
9
...
17
18
19
Next