Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1511.02283
Cited By
v1
v2
v3 (latest)
Generation and Comprehension of Unambiguous Object Descriptions
7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (164★)
Papers citing
"Generation and Comprehension of Unambiguous Object Descriptions"
50 / 917 papers shown
Title
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
154
58
0
16 Aug 2021
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
IEEE International Conference on Computer Vision (ICCV), 2021
Xiaoshi Wu
Hadar Averbuch-Elor
J. Sun
Noah Snavely
145
24
0
12 Aug 2021
Vision-Language Transformer and Query Generation for Referring Segmentation
IEEE International Conference on Computer Vision (ICCV), 2021
Henghui Ding
Chang-rui Liu
Suchen Wang
Xudong Jiang
237
322
0
12 Aug 2021
A Better Loss for Visual-Textual Grounding
ACM Symposium on Applied Computing (SAC), 2021
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
133
3
0
11 Aug 2021
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
ObjD
166
33
0
31 Jul 2021
Using Depth for Improving Referring Expression Comprehension in Real-World Environments
Fethiye Irmak Dogan
Iolanda Leite
188
5
0
09 Jul 2021
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
Conference on Robot Learning (CoRL), 2021
Junha Roh
Karthik Desingh
Ali Farhadi
Dieter Fox
218
110
0
07 Jul 2021
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
Khazar Khorrami
Okko Räsänen
110
9
0
05 Jul 2021
Bridging the Gap Between Object Detection and User Intent via Query-Modulation
Marco Fornoni
Chaochao Yan
Liangchen Luo
Kimberly Wilber
A. Stark
Huayu Chen
Boqing Gong
Andrew G. Howard
ObjD
111
1
0
18 Jun 2021
CMF: Cascaded Multi-model Fusion for Referring Image Segmentation
Jianhua Yang
Yan Huang
Zhanyu Ma
Liang Wang
86
3
0
16 Jun 2021
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan
Jieting Chen
Yuqing Song
Shizhe Chen
Qin Jin
80
0
0
11 Jun 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Engineering applications of artificial intelligence (EAAI), 2021
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
161
15
0
08 Jun 2021
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Si Liu
John Y. Goulermas
ObjD
141
169
0
08 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Neural Information Processing Systems (NeurIPS), 2021
Muchen Li
Leonid Sigal
ObjD
260
236
0
06 Jun 2021
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation
Chen Liang
Yu Wu
Tianfei Zhou
Wenguan Wang
Zongxin Yang
Yunchao Wei
Yi Yang
VOS
218
58
0
02 Jun 2021
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
IEEE International Conference on Computer Vision (ICCV), 2021
Zhengyuan Yang
Songyang Zhang
Liwei Wang
Jiebo Luo
3DPC
280
156
0
24 May 2021
Cross-Modal Progressive Comprehension for Referring Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Si Liu
Tianrui Hui
Shaofei Huang
Yunchao Wei
Yue Liu
Guanbin Li
EgoV
VOS
208
159
0
15 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Computer Vision and Pattern Recognition (CVPR), 2021
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
248
31
0
12 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
225
5
0
12 May 2021
Image interpretation by iterative bottom-up top-down processing
S. Ullman
Liav Assif
Alona Strugatski
B. Vatashsky
Hila Levy
Aviv Netanyahu
A. Yaari
93
5
0
12 May 2021
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention
International Joint Conference on Artificial Intelligence (IJCAI), 2021
Wei Suo
Mengyang Sun
Peng Wang
Qi Wu
ObjD
170
14
0
05 May 2021
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Computer Vision and Pattern Recognition (CVPR), 2021
Guang Feng
Zhiwei Hu
Lihe Zhang
Huchuan Lu
EgoV
184
195
0
05 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
568
1,044
0
26 Apr 2021
The 5th AI City Challenge
M. Naphade
Shuo Wang
D. Anastasiu
Zheng Tang
Ming-Ching Chang
...
C. López
Anuj Sharma
Qi Feng
Vitaly Ablavsky
Stan Sclaroff
217
95
0
25 Apr 2021
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
Findings (Findings), 2021
Kanishk Jain
Vineet Gandhi
188
19
0
21 Apr 2021
Understanding Synonymous Referring Expressions via Contrastive Features
International Journal of Computer Vision (IJCV), 2021
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
ObjD
151
5
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
IEEE International Conference on Computer Vision (ICCV), 2021
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
162
31
0
20 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
451
434
0
17 Apr 2021
Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2021
Binbin Huang
Dongze Lian
Weixin Luo
Shenghua Gao
ObjD
236
121
0
09 Apr 2021
Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction
Mingjiang Liu
Chengli Xiao
Chunlin Chen
116
11
0
04 Apr 2021
Towards General Purpose Vision Systems
Computer Vision and Pattern Recognition (CVPR), 2021
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
230
55
0
01 Apr 2021
Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Computer Vision and Pattern Recognition (CVPR), 2021
Ya Jing
Tao Kong
Wei Wang
Liang Wang
Lei Li
Tieniu Tan
213
160
0
30 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Computer Vision and Pattern Recognition (CVPR), 2021
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
98
18
0
23 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
123
5
0
18 Mar 2021
OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Ke-Jyun Wang
Yun-Hsuan Liu
Hung-Ting Su
Jen-Wei Wang
Yu-Siang Wang
Winston H. Hsu
Wen-Chin Chen
160
25
0
13 Mar 2021
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning
Computer Vision and Pattern Recognition (CVPR), 2021
Mingjie Sun
Jimin Xiao
Eng Gee Lim
ObjD
184
42
0
09 Mar 2021
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
IEEE International Conference on Computer Vision (ICCV), 2021
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
245
135
0
01 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Computer Vision and Pattern Recognition (CVPR), 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
1.0K
1,347
0
17 Feb 2021
Composing Pick-and-Place Tasks By Grounding Language
International Symposium on Experimental Robotics (ISER), 2021
Oier Mees
Wolfram Burgard
LM&Ro
134
37
0
16 Feb 2021
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Linwei Ye
Mrigank Rochan
Zhi Liu
Xiaoqin Zhang
Yang Wang
VOS
EgoV
123
65
0
09 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
International Conference on Machine Learning (ICML), 2021
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
573
603
0
04 Feb 2021
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation
Neural Networks (NN), 2021
Jungjun Kim
Dong-Gyu Lee
Jialin Wu
Hong G Jung
Seong-Whan Lee
ObjD
157
23
0
22 Jan 2021
ArtEmis: Affective Language for Visual Art
Computer Vision and Pattern Recognition (CVPR), 2021
Panos Achlioptas
M. Ovsjanikov
Kilichbek Haydarov
Mohamed Elhoseiny
Leonidas Guibas
110
150
0
19 Jan 2021
Understanding in Artificial Intelligence
S. Maetschke
D. M. Iraola
Pieter Barnard
Elaheh Shafieibavani
Peter Zhong
Ying Xu
Antonio Jimeno Yepes
ELM
VLM
149
0
0
17 Jan 2021
CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions
Qi Feng
Vitaly Ablavsky
Stan Sclaroff
149
50
0
12 Jan 2021
Transformers in Vision: A Survey
ACM Computing Surveys (CSUR), 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
802
3,078
0
04 Jan 2021
PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension
Chao Yang
Guoqing Wang
Dongsheng Li
Huawei Shen
Su Feng
Bin Jiang
104
3
0
20 Dec 2020
Contrastive Learning with Adversarial Perturbations for Conditional Text Generation
International Conference on Learning Representations (ICLR), 2020
Seanie Lee
Dong Bok Lee
Sung Ju Hwang
410
117
0
14 Dec 2020
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Computer Vision and Pattern Recognition (CVPR), 2020
Dave Zhenyu Chen
A. Gholami
Matthias Nießner
Angel X. Chang
3DPC
286
226
0
03 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Transactions of the Association for Computational Linguistics (TACL), 2020
Emanuele Bugliarello
Robert Bamler
Naoaki Okazaki
Desmond Elliott
212
125
0
30 Nov 2020
Previous
1
2
3
...
13
14
15
...
17
18
19
Next