Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1511.02283
Cited By
v1
v2
v3 (latest)
Generation and Comprehension of Unambiguous Object Descriptions
7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (164★)
Papers citing
"Generation and Comprehension of Unambiguous Object Descriptions"
50 / 917 papers shown
Title
Advancing Referring Expression Segmentation Beyond Single Image
IEEE International Conference on Computer Vision (ICCV), 2023
YiXuan Wu
Zhao Zhang
Xie Chi
Feng Zhu
Rui Zhao
VLM
177
24
0
21 May 2023
A Topic-aware Summarization Framework with Different Modal Side Information
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Preslav Nakov
Mingzhe Li
Shen Gao
Xin Cheng
Qiang Yang
Qishen Zhang
Xin Gao
Xiangliang Zhang
263
17
0
19 May 2023
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding
Chenchi Zhang
Jun Xiao
Lei Chen
Jian Shao
Long Chen
VLM
LRM
147
3
0
19 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Neural Information Processing Systems (NeurIPS), 2023
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
270
613
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
447
150
0
18 May 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
IEEE transactions on multimedia (IEEE TMM), 2023
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjD
VLM
372
55
0
15 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
223
30
0
12 May 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
Zhaoyang Zhang
Yantao Shen
Kunyu Shi
Zhaowei Cai
Jun Fang
Siqi Deng
Hao Yang
Davide Modolo
Zhuowen Tu
Stefano Soatto
VLM
166
3
0
11 May 2023
Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
Xin Chen
Houwen Peng
Jiawen Zhu
Dong Wang
Han Hu
Huchuan Lu
308
27
0
27 Apr 2023
π
π
π
-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
International Conference on Machine Learning (ICML), 2023
Chengyue Wu
Teng Wang
Yixiao Ge
Zeyu Lu
Rui-Zhi Zhou
Ying Shan
Ping Luo
MoMe
190
42
0
27 Apr 2023
OmniLabel: A Challenging Benchmark for Language-Based Object Detection
IEEE International Conference on Computer Vision (ICCV), 2023
S. Schulter
G. VijayKumarB.
Yumin Suh
Konstantinos M. Dafnis
Zhixing Zhang
Shiyu Zhao
Dimitris N. Metaxas
ObjD
152
16
0
22 Apr 2023
Segment Everything Everywhere All at Once
Neural Information Processing Systems (NeurIPS), 2023
Xueyan Zou
Jianwei Yang
Hao Zhang
Feng Li
Linjie Li
Jianfeng Wang
Lijuan Wang
Jianfeng Gao
Yong Jae Lee
MLLM
VLM
313
660
0
13 Apr 2023
What does CLIP know about a red circle? Visual prompt engineering for VLMs
IEEE International Conference on Computer Vision (ICCV), 2023
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLM
MLLM
337
226
0
13 Apr 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
European Conference on Computer Vision (ECCV), 2023
Zhe Lin
Xidong Peng
Peishan Cong
Ge Zheng
Yujin Sun
Yuenan Hou
Xinge Zhu
Sibei Yang
Yuexin Ma
VGen
234
12
0
12 Apr 2023
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
Conference on Robot Learning (CoRL), 2023
Mingyu Ding
Yan Xu
Zhenfang Chen
David D. Cox
Ping Luo
J. Tenenbaum
Chuang Gan
LM&Ro
175
24
0
07 Apr 2023
DATE: Domain Adaptive Product Seeker for E-commerce
Computer Vision and Pattern Recognition (CVPR), 2023
Haoyuan Li
Haojie Jiang
Tao Jin
Meng-Juan Li
Yan Chen
Zhijie Lin
Yang Zhao
Zhou Zhao
288
6
0
07 Apr 2023
Natural Language Robot Programming: NLP integrated with autonomous robotic grasping
Muhammad Arshad Khan
Max Kenney
Jack Painter
Disha Kamale
Riza Batista-Navarro
Amir M. Ghalamzan-E.
LM&Ro
128
4
0
06 Apr 2023
Zero-shot Referring Image Segmentation with Global-Local Context Features
Computer Vision and Pattern Recognition (CVPR), 2023
S. Yu
Paul Hongsuck Seo
Jeany Son
276
76
0
31 Mar 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
155
12
0
23 Mar 2023
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Morris Alper
Michael Fiman
Hadar Averbuch-Elor
VLM
LRM
193
18
0
21 Mar 2023
Joint Visual Grounding and Tracking with Natural Language Specification
Computer Vision and Pattern Recognition (CVPR), 2023
Li Zhou
Zikun Zhou
Kaige Mao
Zhenyu He
229
104
0
21 Mar 2023
Parallel Vertex Diffusion for Unified Visual Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ze-Long Cheng
Kehan Li
Peng Jin
Xiang Ji
Li-ming Yuan
Chang-rui Liu
Jie Chen
DiffM
219
36
0
13 Mar 2023
Universal Instance Perception as Object Discovery and Retrieval
Computer Vision and Pattern Recognition (CVPR), 2023
B. Yan
Yi Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
VOS
VLM
LRM
342
232
0
12 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhao Yang
Yuan Liu
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Juil Sock
177
31
0
11 Mar 2023
Referring Multi-Object Tracking
Computer Vision and Pattern Recognition (CVPR), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
202
113
0
06 Mar 2023
Naming Objects for Vision-and-Language Manipulation
Tokuhiro Nishikawa
Kazumi Aoyama
Shunichi Sekiguchi
Takayoshi Takayanagi
Jianing Wu
Yu Ishihara
Tamaki Kojima
Jerry Jun Yokono
135
1
0
06 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
142
1
0
28 Feb 2023
Focusing On Targets For Improving Weakly Supervised Visual Grounding
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
V. Pham
Nao Mishima
ObjD
176
1
0
22 Feb 2023
Connecting Vision and Language with Video Localized Narratives
Computer Vision and Pattern Recognition (CVPR), 2023
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
270
30
0
22 Feb 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Findings (Findings), 2023
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
192
2
0
17 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Journal of Computing and Information Science in Engineering (JCISE), 2023
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
312
63
0
14 Feb 2023
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval
ACM Transactions on Knowledge Discovery from Data (TKDD), 2023
Yansong Gong
Georgina Cosma
Axel Finke
ViT
275
4
0
13 Feb 2023
See Your Heart: Psychological states Interpretation through Visual Creations
Likun Yang
Xiaokun Feng
Xiaotang Chen
Shiyu Zhang
Kaiqi Huang
43
1
0
11 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
International Conference on Machine Learning (ICML), 2023
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
227
217
0
01 Feb 2023
Linguistic Query-Guided Mask Generation for Referring Image Segmentation
Pattern Recognition (Pattern Recogn.), 2023
Zhichao Wei
Xiaohao Chen
Mingqiang Chen
Siyu Zhu
VLM
275
2
0
16 Jan 2023
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
AAAI Conference on Artificial Intelligence (AAAI), 2023
Haowei Wang
Jiayi Ji
Weihao Ye
Yongjian Wu
Xiaoshuai Sun
174
17
0
09 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Computer Vision and Pattern Recognition (CVPR), 2023
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
370
69
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
339
4
0
05 Jan 2023
PACO: Parts and Attributes of Common Objects
Computer Vision and Pattern Recognition (CVPR), 2023
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
184
141
0
04 Jan 2023
Position-Aware Contrastive Alignment for Referring Image Segmentation
Bo Chen
Zhiwei Hu
Zhilong Ji
Jinfeng Bai
W. Zuo
203
9
0
27 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Computer Vision and Pattern Recognition (CVPR), 2022
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLM
MLLM
ObjD
264
324
0
21 Dec 2022
MetaCLUE: Towards Comprehensive Visual Metaphors Research
Computer Vision and Pattern Recognition (CVPR), 2022
Arjun Reddy Akula
Brenda S. Driscoll
P. Narayana
Soravit Changpinyo
Zhi-xuan Jia
...
Sugato Basu
Leonidas Guibas
William T. Freeman
Yuanzhen Li
Varun Jampani
CLIP
VLM
146
41
0
19 Dec 2022
Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning
Hui Li
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Yao-Min Zhao
199
27
0
17 Dec 2022
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Haoxuan You
Rui Sun
Zhecan Wang
Kai-Wei Chang
Shih-Fu Chang
124
6
0
14 Dec 2022
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
231
32
0
12 Dec 2022
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
Neural Information Processing Systems (NeurIPS), 2022
Zicheng Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
Wei Ke
195
35
0
04 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
195
2
0
02 Dec 2022
Abstract Visual Reasoning with Tangram Shapes
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Anya Ji
Noriyuki Kojima
N. Rush
Alane Suhr
Wai Keen Vong
Robert D. Hawkins
Yoav Artzi
LRM
173
50
0
29 Nov 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2022
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
210
39
0
28 Nov 2022
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
Neural Information Processing Systems (NeurIPS), 2022
Eslam Mohamed Bakr
Yasmeen Alsaedy
Mohamed Elhoseiny
3DPC
157
60
0
25 Nov 2022
Previous
1
2
3
...
10
11
12
...
17
18
19
Next