ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02283
  4. Cited By
Generation and Comprehension of Unambiguous Object Descriptions
v1v2v3 (latest)

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
    ObjD
ArXiv (abs)PDFHTMLGithub (164★)

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 917 papers shown
Title
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
  Transformer Models
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
192
9
0
15 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Shiyang Feng
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Jiaming Song
Yu Qiao
MLLMVLM
272
271
0
13 Nov 2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone
  GUI Navigation
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan
Zhengyuan Yang
Wanrong Zhu
Kevin Qinghong Lin
Linjie Li
...
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
LLMAGLM&Ro
317
138
0
13 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
PerceptionGPT: Effectively Fusing Visual Perception into LLMComputer Vision and Pattern Recognition (CVPR), 2023
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
175
55
0
11 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision TasksComputer Vision and Pattern Recognition (CVPR), 2023
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
313
357
0
10 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in ClutterConference on Robot Learning (CoRL), 2023
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
195
39
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoEMLLMVLM
372
24
0
09 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLMVLM
299
73
0
08 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task CompletionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
156
10
0
07 Nov 2023
CoVLM: Composing Visual Entities and Relationships in Large Language
  Models Via Communicative Decoding
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative DecodingInternational Conference on Learning Representations (ICLR), 2023
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
MLLM
327
18
0
06 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLMMLLM
599
698
0
06 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation
  for Grounding-Based Vision and Language Models
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
227
2
0
05 Nov 2023
Towards Omni-supervised Referring Expression Segmentation
Towards Omni-supervised Referring Expression SegmentationIEEE International Conference on Multimedia and Expo (ICME), 2023
Minglang Huang
Weihao Ye
Gen Luo
Guannan Jiang
Weilin Zhuang
Xiaoshuai Sun
282
1
0
01 Nov 2023
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale
  Point Cloud Data
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud DataNeural Information Processing Systems (NeurIPS), 2023
Taiki Miyanishi
Fumiya Kitamori
Shuhei Kurita
Jungdae Lee
M. Kawanabe
Nakamasa Inoue
AI4TS3DPC
190
14
0
28 Oct 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open
  Environments
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open EnvironmentsNeural Information Processing Systems (NeurIPS), 2023
Mengxue Qu
Yu-Huan Wu
Wu Liu
Xiaodan Liang
Jingkuan Song
Yao-Min Zhao
Yunchao Wei
175
19
0
26 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjDVLM
228
14
0
22 Oct 2023
LanPose: Language-Instructed 6D Object Pose Estimation for Robotic
  Assembly
LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly
Bowen Fu
Sek Kun Leong
Yan Di
Jiwen Tang
Xiangyang Ji
269
5
0
20 Oct 2023
Segment, Select, Correct: A Framework for Weakly-Supervised Referring
  Segmentation
Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation
Francisco Eiras
Kemal Oksuz
Adel Bibi
Juil Sock
P. Dokania
260
2
0
20 Oct 2023
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
PGA: Personalizing Grasping Agents with Single Human-Robot InteractionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Junghyun Kim
Gi-Cheon Kang
Suhyung Choi
Seoyun Yang
Minjoon Jung
Byoung-Tak Zhang
188
0
0
19 Oct 2023
NICE: Improving Panoptic Narrative Detection and Segmentation with
  Cascading Collaborative Learning
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Haowei Wang
Jiayi Ji
Tianyu Guo
Yilong Yang
Weihao Ye
Xiaoshuai Sun
Rongrong Ji
284
8
0
17 Oct 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
1.1K
608
0
14 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLMVLM
296
66
0
13 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret: Refer and Ground Anything Anywhere at Any GranularityInternational Conference on Learning Representations (ICLR), 2023
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjDMLLMVLM
399
448
0
11 Oct 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
TextPSG: Panoptic Scene Graph Generation from Textual DescriptionsIEEE International Conference on Computer Vision (ICCV), 2023
Chengyang Zhao
Songlin Yang
Zhenfang Chen
Mingyu Ding
Chuang Gan
341
23
0
10 Oct 2023
InstructDET: Diversifying Referring Object Detection with Generalized
  Instructions
InstructDET: Diversifying Referring Object Detection with Generalized InstructionsInternational Conference on Learning Representations (ICLR), 2023
Ronghao Dang
Jiangyan Feng
Haodong Zhang
Chongjian Ge
Lin Song
...
Chengju Liu
Qi Chen
Feng Zhu
Rui Zhao
Yibing Song
ObjD
395
16
0
08 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
Low-Resolution Self-Attention for Semantic SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
429
10
0
08 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction TuningComputer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
568
4,047
0
05 Oct 2023
Pink: Unveiling the Power of Referential Comprehension for Multi-modal
  LLMs
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMsComputer Vision and Pattern Recognition (CVPR), 2023
Shiyu Xuan
Qingpei Guo
Ming Yang
Shiliang Zhang
MLLMObjD
205
51
0
01 Oct 2023
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Wei Ji
Li Li
Marco Pleines
Xiangyan Liu
Xu Yang
Juncheng Billy Li
Roger Zimmermann
154
12
0
29 Sep 2023
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene GraphsConference on Robot Learning (CoRL), 2023
Haonan Chang
Kowndinya Boyalakuntla
Shiyang Lu
Siwei Cai
E. Jing
...
Shijie Geng
Adeeb Abbas
Lifeng Zhou
Kostas Bekris
Abdeslam Boularias
198
35
0
27 Sep 2023
Multi-modal Domain Adaptation for REG via Relation Transfer
Multi-modal Domain Adaptation for REG via Relation Transfer
Yifan Ding
Liqiang Wang
Boqing Gong
175
0
0
23 Sep 2023
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary
  Instance Segmentation
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance SegmentationInternational Journal of Computer Vision (IJCV), 2023
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffMVLM
264
45
0
22 Sep 2023
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
Multi3DRefer: Grounding Text Description to Multiple 3D ObjectsIEEE International Conference on Computer Vision (ICCV), 2023
Yiming Zhang
ZeMing Gong
Angel X. Chang
354
129
0
11 Sep 2023
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual GroundingEuropean Conference on Computer Vision (ECCV), 2023
Ozan Unal
Daniel Gehrig
Suman Saha
Luc Van Gool
221
27
0
08 Sep 2023
Tracking Anything with Decoupled Video Segmentation
Tracking Anything with Decoupled Video SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Ho Kei Cheng
Seoung Wug Oh
Brian L. Price
Alexander Schwing
Joon-Young Lee
VOSVLM
247
197
0
07 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision TasksComputer Vision and Pattern Recognition (CVPR), 2023
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffMVLM
249
155
0
07 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex
  Visually-Grounded Referencing using Determiners
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using DeterminersIEEE International Conference on Computer Vision (ICCV), 2023
Clarence Lee
M Ganesh Kumar
Cheston Tan
167
3
0
07 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
267
2
0
06 Sep 2023
Dense Object Grounding in 3D Scenes
Dense Object Grounding in 3D ScenesACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
218
24
0
05 Sep 2023
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual
  Grounders
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual GroundersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xuyang Liu
Siteng Huang
Yachen Kang
Honggang Chen
Donglin Wang
ObjD
416
18
0
03 Sep 2023
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
CoTDet: Affordance Knowledge Prompting for Task Driven Object DetectionIEEE International Conference on Computer Vision (ICCV), 2023
Jiajin Tang
Ge Zheng
Jingyi Yu
Sibei Yang
ObjD
179
39
0
03 Sep 2023
Spatial and Visual Perspective-Taking via View Rotation and Relation
  Reasoning for Embodied Reference Understanding
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference UnderstandingEuropean Conference on Computer Vision (ECCV), 2023
Cheng Shi
Sibei Yang
LRM
150
12
0
03 Sep 2023
Contrastive Grouping with Transformer for Referring Image Segmentation
Contrastive Grouping with Transformer for Referring Image SegmentationComputer Vision and Pattern Recognition (CVPR), 2023
Jiajin Tang
Ge Zheng
Cheng Shi
Sibei Yang
ViT
254
57
0
02 Sep 2023
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Minheng Ni
Yabo Zhang
Kailai Feng
Xiaoming Li
Yiwen Guo
W. Zuo
DiffM
289
33
0
31 Aug 2023
GREC: Generalized Referring Expression Comprehension
GREC: Generalized Referring Expression Comprehension
Shuting He
Henghui Ding
Chang Liu
Xudong Jiang
ObjD
216
34
0
30 Aug 2023
Shatter and Gather: Learning Referring Image Segmentation with Text
  Supervision
Shatter and Gather: Learning Referring Image Segmentation with Text SupervisionIEEE International Conference on Computer Vision (ICCV), 2023
Dongwon Kim
Nam-Won Kim
Cuiling Lan
Suha Kwak
VLM
255
26
0
29 Aug 2023
Referring Image Segmentation Using Text Supervision
Referring Image Segmentation Using Text SupervisionIEEE International Conference on Computer Vision (ICCV), 2023
Fang Liu
Yuhao Liu
Yuqiu Kong
Ke Xu
Lulu Zhang
Baocai Yin
Gerhard Hancke
Rynson W. H. Lau
230
46
0
28 Aug 2023
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
  Parameter and Memory
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryComputer Vision and Pattern Recognition (CVPR), 2023
Haiwen Diao
Bo Wan
Yanzhe Zhang
Xuecong Jia
Huchuan Lu
Long Chen
VLM
185
25
0
28 Aug 2023
Towards Unified Token Learning for Vision-Language Tracking
Towards Unified Token Learning for Vision-Language Tracking
Yaozong Zheng
Bineng Zhong
Qihua Liang
Guorong Li
Rongrong Ji
Xianxian Li
247
78
0
27 Aug 2023
Beyond One-to-One: Rethinking the Referring Image Segmentation
Beyond One-to-One: Rethinking the Referring Image SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Yutao Hu
Qixiong Wang
Wenqi Shao
Enze Xie
Zhenguo Li
Jungong Han
Ping Luo
3DV
219
65
0
26 Aug 2023
Previous
123...8910...171819
Next