ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07129
  4. Cited By
Zero-Shot Grounding of Objects from Natural Language Queries

Zero-Shot Grounding of Objects from Natural Language Queries

IEEE International Conference on Computer Vision (ICCV), 2019
20 August 2019
Arka Sadhu
Kan Chen
Ram Nevatia
    ObjD
ArXiv (abs)PDFHTML

Papers citing "Zero-Shot Grounding of Objects from Natural Language Queries"

50 / 90 papers shown
GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding
GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding
P. Zhang
Y. Zhang
Luxiao Xu
J. Lin
Zonghao Guo
Fengxiang Wang
Xue Yang
Kaiwen Wei
Lei Wang
ObjD
233
1
0
02 Dec 2025
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based Attack
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based AttackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Xin Liu
Aoyang Zhou
Aoyang Zhou
AAML
148
0
0
02 Nov 2025
Referring Expression Comprehension for Small Objects
Referring Expression Comprehension for Small Objects
Kanoko Goto
Takumi Hirose
Mahiro Ukai
Shuhei Kurita
Nakamasa Inoue
ObjD
180
1
0
04 Oct 2025
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Jiangnan Xie
Xiaolong Zheng
Liang Zheng
ObjD
194
0
0
08 Sep 2025
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
Zhan Shi
Song Wang
Junbo Chen
Jianke Zhu
350
1
0
02 Aug 2025
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Lingdong Kong
Dongyue Lu
Ao Liang
Rong Li
Yuhao Dong
Tianshuai Hu
Lai Xing Ng
Wei Tsang Ooi
Benoit R. Cottereau
VGen
372
5
0
23 Jul 2025
RemoteSAM: Towards Segment Anything for Earth Observation
RemoteSAM: Towards Segment Anything for Earth Observation
Liang Yao
Fan Liu
Delong Chen
Chuanyi Zhang
Yijun Wang
Ziyun Chen
Wei Xu
Shimin Di
Yuhui Zheng
833
25
0
23 May 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
1.1K
41
0
28 Dec 2024
Automatic Prompt Generation and Grounding Object Detection for Zero-Shot
  Image Anomaly Detection
Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly DetectionAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2024
Tsun-hin Cheung
Ka-Chun Fung
Songjiang Lai
Kwan-Ho Lin
Vincent To-Yee NG
K. Lam
285
0
0
28 Nov 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference
  Understanding
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
301
0
0
13 Nov 2024
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed
  Spatial Attention
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial AttentionNeural Information Processing Systems (NeurIPS), 2024
Haomeng Zhang
Chiao-An Yang
Raymond A. Yeh
309
7
0
29 Oct 2024
Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding
Joint Top-Down and Bottom-Up Frameworks for 3D Visual GroundingInternational Conference on Pattern Recognition (ICPR), 2024
Yang Liu
Daizong Liu
Wei Hu
3DPC
425
9
0
21 Oct 2024
ResVG: Enhancing Relation and Semantic Understanding in Multiple
  Instances for Visual Grounding
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual GroundingACM Multimedia (MM), 2024
Minghang Zheng
Jiahua Zhang
Qingchao Chen
Yuxin Peng
Yang Liu
ObjD
337
7
0
29 Aug 2024
R2G: Reasoning to Ground in 3D Scenes
R2G: Reasoning to Ground in 3D ScenesPattern Recognition (Pattern Recogn.), 2024
Yixuan Li
Zan Wang
Wei Liang
365
4
0
24 Aug 2024
Tell Codec What Worth Compressing: Semantically Disentangled Image
  Coding for Machine with LMMs
Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMsVisual Communications and Image Processing (VCIP), 2024
Jinming Liu
Yuntao Wei
Junyan Lin
Shengyang Zhao
Heming Sun
Zhibo Chen
Wenjun Zeng
Xin Jin
432
7
0
16 Aug 2024
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
Fan Yang
Sicheng Zhao
Yanhao Zhang
Haoxiang Chen
Hui Chen
Wenbo Tang
Guiguang Ding
290
1
0
14 Aug 2024
3D-GRES: Generalized 3D Referring Expression Segmentation
3D-GRES: Generalized 3D Referring Expression Segmentation
Changli Wu
Yihang Liu
Jiayi Ji
Yiwei Ma
Haowei Wang
Gen Luo
Henghui Ding
Xiaoshuai Sun
Rongrong Ji
316
19
0
30 Jul 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual
  Grounding
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
460
19
0
03 Jul 2024
LLM-Optic: Unveiling the Capabilities of Large Language Models for
  Universal Visual Grounding
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
Haoyu Zhao
Wenhang Ge
Ying-Cong Chen
ObjDMLLMVLM
376
7
0
27 May 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor
  Image Comprehension in Remote Sensing Domain
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
484
260
0
30 Jan 2024
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language
  Pre-training and Open-Vocabulary Object Detection
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen
Tiancheng Zhao
Mingwei Zhu
Yuxiang Cai
VLMObjD
483
29
0
22 Dec 2023
Context Disentangling and Prototype Inheriting for Robust Visual
  Grounding
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
302
44
0
19 Dec 2023
Mono3DVG: 3D Visual Grounding in Monocular Images
Mono3DVG: 3D Visual Grounding in Monocular ImagesAAAI Conference on Artificial Intelligence (AAAI), 2023
Yangfan Zhan
Yuan. Yuan
Zhitong Xiong
MDE
294
37
0
13 Dec 2023
Which One? Leveraging Context Between Objects and Multiple Views for
  Language Grounding
Which One? Leveraging Context Between Objects and Multiple Views for Language GroundingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Chancharik Mitra
Abrar Anwar
Rodolfo Corona
Dan Klein
Trevor Darrell
Jesse Thomason
274
3
0
12 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in ClutterConference on Robot Learning (CoRL), 2023
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
298
45
0
09 Nov 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open
  Environments
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open EnvironmentsNeural Information Processing Systems (NeurIPS), 2023
Mengxue Qu
Yu-Huan Wu
Wu Liu
Xiaodan Liang
Jingkuan Song
Yao-Min Zhao
Yunchao Wei
258
21
0
26 Oct 2023
NICE: Improving Panoptic Narrative Detection and Segmentation with
  Cascading Collaborative Learning
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Haowei Wang
Jiayi Ji
Tianyu Guo
Yilong Yang
Weihao Ye
Xiaoshuai Sun
Rongrong Ji
421
10
0
17 Oct 2023
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Wei Ji
Li Li
Marco Pleines
Xiangyan Liu
Xu Yang
Juncheng Billy Li
Roger Zimmermann
227
12
0
29 Sep 2023
Spatial and Visual Perspective-Taking via View Rotation and Relation
  Reasoning for Embodied Reference Understanding
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference UnderstandingEuropean Conference on Computer Vision (ECCV), 2023
Cheng Shi
Sibei Yang
LRM
195
13
0
03 Sep 2023
Contrastive Grouping with Transformer for Referring Image Segmentation
Contrastive Grouping with Transformer for Referring Image SegmentationComputer Vision and Pattern Recognition (CVPR), 2023
Jiajin Tang
Ge Zheng
Cheng Shi
Sibei Yang
ViT
427
66
0
02 Sep 2023
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for
  End-to-End 3D Referring Expression Segmentation
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023
Changli Wu
Yiwei Ma
Qi Chen
Haowei Wang
Gen Luo
Jiayi Ji
Xiaoshuai Sun
3DV
299
38
0
31 Aug 2023
Described Object Detection: Liberating Object Detection with Flexible
  Expressions
Described Object Detection: Liberating Object Detection with Flexible ExpressionsNeural Information Processing Systems (NeurIPS), 2023
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
343
56
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based
  Centerpoint Supervision
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
299
7
0
23 Jul 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual GroundingIEEE transactions on multimedia (IEEE TMM), 2023
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjDVLM
543
67
0
15 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future
  Trends
Vision-Language Models in Remote Sensing: Current Progress and Future TrendsIEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
419
187
0
09 May 2023
Open-vocabulary Object Segmentation with Diffusion Models
Open-vocabulary Object Segmentation with Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Ziyi Li
Qinye Zhou
Xiaoyun Zhang
Ya Zhang
Yanfeng Wang
Weidi Xie
VLM
376
95
0
12 Jan 2023
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for
  3D Visual Grounding
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual GroundingNeural Information Processing Systems (NeurIPS), 2022
Eslam Mohamed Bakr
Yasmeen Alsaedy
Mohamed Elhoseiny
3DPC
228
62
0
25 Nov 2022
YORO -- Lightweight End to End Visual Grounding
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
260
27
0
15 Nov 2022
VLT: Vision-Language Transformer and Query Generation for Referring
  Segmentation
VLT: Vision-Language Transformer and Query Generation for Referring SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Henghui Ding
Chang Liu
Suchen Wang
Xudong Jiang
370
167
0
28 Oct 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
279
207
0
23 Oct 2022
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Georgios Tziafas
Hamidreza Kasaei
LM&Ro
462
5
0
03 Oct 2022
One for All: One-stage Referring Expression Comprehension with Dynamic
  Reasoning
One for All: One-stage Referring Expression Comprehension with Dynamic ReasoningNeurocomputing (Neurocomputing), 2022
Zhipeng Zhang
Zhimin Wei
Zhongzhen Huang
Rui Niu
Peng Wang
ObjDLRM
332
11
0
31 Jul 2022
DoRO: Disambiguation of referred object for embodied agents
DoRO: Disambiguation of referred object for embodied agentsIEEE Robotics and Automation Letters (RA-L), 2022
Pradip Pramanick
Chayan Sarkar
S. Paul
R. Roychoudhury
Brojeshwar Bhowmick
LM&Ro
215
21
0
28 Jul 2022
SiRi: A Simple Selective Retraining Mechanism for Transformer-based
  Visual Grounding
SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual GroundingEuropean Conference on Computer Vision (ECCV), 2022
Mengxue Qu
Yu Wu
Wu Liu
Qiqi Gong
Xiaodan Liang
Olga Russakovsky
Yao Zhao
Yunchao Wei
ObjD
135
26
0
27 Jul 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
  Transformer
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
291
96
0
14 Jun 2022
Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity
  Resolution
Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution
Georgios Tziafas
S. Kasaei
333
2
0
24 May 2022
Improving Visual Grounding with Visual-Linguistic Verification and
  Iterative Reasoning
Improving Visual Grounding with Visual-Linguistic Verification and Iterative ReasoningComputer Vision and Pattern Recognition (CVPR), 2022
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
354
165
0
30 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
  One-Stage Referring Expression Comprehension
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionIEEE transactions on multimedia (IEEE TMM), 2022
Gen Luo
Weihao Ye
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
279
13
0
17 Apr 2022
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive
  Selection
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive SelectionComputer Vision and Pattern Recognition (CVPR), 2022
Jun-Bin Luo
Jiahui Fu
Xianghao Kong
Chen Gao
Haibing Ren
Hao Shen
Huaxia Xia
Si Liu
330
134
0
13 Apr 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression
  Comprehension
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Sanjay Subramanian
William Merrill
Trevor Darrell
Matt Gardner
Sameer Singh
Anna Rohrbach
ObjD
327
169
0
12 Apr 2022
12
Next
Page 1 of 2