ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02283
  4. Cited By
Generation and Comprehension of Unambiguous Object Descriptions
v1v2v3 (latest)

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
    ObjD
ArXiv (abs)PDFHTMLGithub (164★)

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 917 papers shown
Title
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLMCLIP
110
8
0
21 Nov 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Language Conditioned Spatial Relation Reasoning for 3D Object GroundingNeural Information Processing Systems (NeurIPS), 2022
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
219
125
0
17 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression
  Segmentation and Generation
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
154
5
0
15 Nov 2022
A Comprehensive Survey of Transformers for Computer Vision
A Comprehensive Survey of Transformers for Computer Vision
Sonain Jamil
Md. Jalil Piran
Oh-Jin Kwon
ViT
123
78
0
11 Nov 2022
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal
  Open-domain Conversation
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain ConversationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Jiazhan Feng
Qingfeng Sun
Can Xu
Lu Wang
Yaming Yang
Chongyang Tao
Dongyan Zhao
Qingwei Lin
235
66
0
10 Nov 2022
VLT: Vision-Language Transformer and Query Generation for Referring
  Segmentation
VLT: Vision-Language Transformer and Query Generation for Referring SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Henghui Ding
Chang Liu
Suchen Wang
Xudong Jiang
276
152
0
28 Oct 2022
Multilingual Multimodal Learning with Machine Translated Text
Multilingual Multimodal Learning with Machine Translated TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Qiu
Dan Oneaţă
Emanuele Bugliarello
Stella Frank
Desmond Elliott
249
17
0
24 Oct 2022
Towards Unifying Reference Expression Generation and Comprehension
Towards Unifying Reference Expression Generation and ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
126
9
0
24 Oct 2022
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Jiaming Chen
Weihua Luo
Ran Song
Xiaolin K. Wei
Lin Ma
Wei Emma Zhang
3DV
273
7
0
22 Oct 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun
  Dependencies?
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
135
17
0
21 Oct 2022
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun
  Distillation
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationNeural Information Processing Systems (NeurIPS), 2022
Pengfei Li
Beiwen Tian
Yongliang Shi
Xiaoxue Chen
Hao Zhao
Guyue Zhou
Ya Zhang
229
29
0
19 Oct 2022
ULN: Towards Underspecified Vision-and-Language Navigation
ULN: Towards Underspecified Vision-and-Language NavigationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Weixi Feng
Tsu-Jui Fu
Yujie Lu
William Yang Wang
274
5
0
18 Oct 2022
Understanding Embodied Reference with Touch-Line Transformer
Understanding Embodied Reference with Touch-Line TransformerInternational Conference on Learning Representations (ICLR), 2022
Yongqian Li
Xiaoxue Chen
Hao Zhao
Jiangtao Gong
Guyue Zhou
Federico Rossano
Yixin Zhu
246
20
0
11 Oct 2022
Video Referring Expression Comprehension via Transformer with
  Content-aware Query
Video Referring Expression Comprehension via Transformer with Content-aware Query
Ji Jiang
Meng Cao
Tengtao Song
Yuexian Zou
242
5
0
06 Oct 2022
Embodied Referring Expression for Manipulation Question Answering in
  Interactive Environment
Embodied Referring Expression for Manipulation Question Answering in Interactive EnvironmentIEEE International Conference on Robotics and Automation (ICRA), 2022
Qie Sima
Sinan Tan
Huaping Liu
LM&Ro
153
8
0
06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
314
37
0
05 Oct 2022
Two Video Data Sets for Tracking and Retrieval of Out of Distribution
  Objects
Two Video Data Sets for Tracking and Retrieval of Out of Distribution ObjectsAsian Conference on Computer Vision (ACCV), 2022
Kira Maag
Robin Shing Moon Chan
Svenja Uhlemeyer
K. Kowol
Hanno Gottschalk
244
21
0
05 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Affection: Learning Affective Explanations for Real-World Visual DataComputer Vision and Pattern Recognition (CVPR), 2022
Panos Achlioptas
M. Ovsjanikov
Leonidas Guibas
Sergey Tulyakov
149
24
0
04 Oct 2022
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Georgios Tziafas
Hamidreza Kasaei
LM&Ro
287
5
0
03 Oct 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual
  Grounding
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual GroundingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
166
42
0
28 Sep 2022
Towards Robust Referring Image Segmentation
Towards Robust Referring Image Segmentation
Jianzong Wu
Xiangtai Li
Xia Li
Henghui Ding
Yu Tong
Dacheng Tao
3DV
269
57
0
20 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
274
159
0
07 Sep 2022
Multi-Modal Experience Inspired AI Creation
Multi-Modal Experience Inspired AI CreationACM Multimedia (ACM MM), 2022
Qian Cao
Xu Chen
Ruihua Song
Hao Jiang
Guangyan Yang
Bo Zhao
126
3
0
02 Sep 2022
Learning More May Not Be Better: Knowledge Transferability in Vision and
  Language Tasks
Learning More May Not Be Better: Knowledge Transferability in Vision and Language TasksJournal of Imaging (JI), 2022
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
104
1
0
23 Aug 2022
Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2
Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2
Xinghui Zhou
Xin Jin
Jianwen Lv
Heng Huang
Ming Mao
Shuai Cui
CoGe
103
0
0
09 Aug 2022
Prompt Tuning for Generative Multimodal Pretrained Models
Prompt Tuning for Generative Multimodal Pretrained Models
Han Yang
Junyang Lin
An Yang
Peng Wang
Chang Zhou
Hongxia Yang
VLMLRMVPVLM
172
37
0
04 Aug 2022
One for All: One-stage Referring Expression Comprehension with Dynamic
  Reasoning
One for All: One-stage Referring Expression Comprehension with Dynamic ReasoningNeurocomputing (Neurocomputing), 2022
Zhipeng Zhang
Zhimin Wei
Zhongzhen Huang
Rui Niu
Peng Wang
ObjDLRM
252
9
0
31 Jul 2022
Visual Recognition by Request
Visual Recognition by RequestComputer Vision and Pattern Recognition (CVPR), 2022
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
212
16
0
28 Jul 2022
Skimming, Locating, then Perusing: A Human-Like Framework for Natural
  Language Video Localization
Skimming, Locating, then Perusing: A Human-Like Framework for Natural Language Video LocalizationACM Multimedia (ACM MM), 2022
Daizong Liu
Wei Hu
173
43
0
27 Jul 2022
SiRi: A Simple Selective Retraining Mechanism for Transformer-based
  Visual Grounding
SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual GroundingEuropean Conference on Computer Vision (ECCV), 2022
Mengxue Qu
Yu Wu
Wu Liu
Qiqi Gong
Xiaodan Liang
Olga Russakovsky
Yao Zhao
Yunchao Wei
ObjD
109
26
0
27 Jul 2022
Innovations in Neural Data-to-text Generation: A Survey
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
245
12
0
25 Jul 2022
Correspondence Matters for Video Referring Expression Comprehension
Correspondence Matters for Video Referring Expression ComprehensionACM Multimedia (ACM MM), 2022
Meng Cao
Ji Jiang
Long Chen
Yuexian Zou
VOS
261
21
0
21 Jul 2022
Exploiting Unlabeled Data with Vision and Language Models for Object
  Detection
Exploiting Unlabeled Data with Vision and Language Models for Object DetectionEuropean Conference on Computer Vision (ECCV), 2022
Shiyu Zhao
Zhixing Zhang
S. Schulter
Long Zhao
Vijay Kumar B.G
Anastasis Stathopoulos
Manmohan Chandraker
Dimitris N. Metaxas
VLMObjD
163
121
0
18 Jul 2022
Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised
  Referring Expression Grounding
Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression GroundingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Xuejing Liu
Liang Li
Shuhui Wang
Zhengjun Zha
Dechao Meng
Qi Tian
Qingming Huang
185
72
0
18 Jul 2022
Toward Explainable and Fine-Grained 3D Grounding through Referring
  Textual Phrases
Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
Zhihao Yuan
Xu Yan
Zhuo Li
Xuhao Li
Yao Guo
Shuguang Cui
Zhen Li
144
18
0
05 Jul 2022
Are metrics measuring what they should? An evaluation of image
  captioning task metrics
Are metrics measuring what they should? An evaluation of image captioning task metricsSignal processing. Image communication (SPIC), 2022
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
199
9
0
04 Jul 2022
Towards Robust Referring Video Object Segmentation with Cyclic
  Relational Consensus
Towards Robust Referring Video Object Segmentation with Cyclic Relational ConsensusIEEE International Conference on Computer Vision (ICCV), 2022
Xiang Li
Jinglu Wang
Xiaohao Xu
Xiao Li
Bhiksha Raj
Yan Lu
VOS
264
56
0
04 Jul 2022
The Second Place Solution for The 4th Large-scale Video Object
  Segmentation Challenge--Track 3: Referring Video Object Segmentation
The Second Place Solution for The 4th Large-scale Video Object Segmentation Challenge--Track 3: Referring Video Object Segmentation
Leilei Cao
Zhuang Li
Bo Yan
Feng Zhang
Fengliang Qi
Yucheng Hu
Hongbin Wang
VOS
149
3
0
24 Jun 2022
Bear the Query in Mind: Visual Grounding with Query-conditioned
  Convolution
Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution
Chonghan Chen
Qi Jiang1
Chih-Hao Wang
Noel Chen
Haohan Wang
Xiang Li
Bhiksha Raj
ObjD
248
0
0
18 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal TasksInternational Conference on Learning Representations (ICLR), 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjDVLMMLLM
353
472
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
329
121
0
16 Jun 2022
RefCrowd: Grounding the Target in Crowd with Referring Expressions
RefCrowd: Grounding the Target in Crowd with Referring ExpressionsACM Multimedia (ACM MM), 2022
Heqian Qiu
Hongliang Li
Taijin Zhao
Lanxiao Wang
Qingbo Wu
Fanman Meng
ObjD
165
9
0
16 Jun 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
  Transformer
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
204
86
0
14 Jun 2022
Referring Image Matting
Referring Image MattingComputer Vision and Pattern Recognition (CVPR), 2022
Jizhizi Li
Jing Zhang
Dacheng Tao
ObjDVLM
204
31
0
10 Jun 2022
Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation
Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Li Mingzhe
Xiexiong Lin
Preslav Nakov
Jinxiong Chang
Qishen Zhang
...
Taifeng Wang
Zhongyi Liu
Wei Chu
Dongyan Zhao
Rui Yan
338
14
0
26 May 2022
Guiding Visual Question Answering with Attention Priors
Guiding Visual Question Answering with Attention PriorsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
T. Le
Vuong Le
Sunil R. Gupta
Svetha Venkatesh
T. Tran
182
8
0
25 May 2022
Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity
  Resolution
Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution
Georgios Tziafas
S. Kasaei
241
2
0
24 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
  Skip-connections
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLMMLLM
241
265
0
24 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for
  Vision-language Models
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLMMLLM
221
43
0
23 May 2022
Training Vision-Language Transformers from Captions
Training Vision-Language Transformers from Captions
Liangke Gui
Yingshan Chang
Qiuyuan Huang
Subhojit Som
Alexander G. Hauptmann
Jianfeng Gao
Yonatan Bisk
VLMViT
358
11
0
19 May 2022
Previous
123...111213...171819
Next