ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.09554
  4. Cited By
Referring Expression Comprehension: A Survey of Methods and Datasets
v1v2 (latest)

Referring Expression Comprehension: A Survey of Methods and Datasets

IEEE transactions on multimedia (TMM), 2020
19 July 2020
Yanyuan Qiao
Chaorui Deng
Qi Wu
    ObjD
ArXiv (abs)PDFHTML

Papers citing "Referring Expression Comprehension: A Survey of Methods and Datasets"

50 / 58 papers shown
Zero-Shot Referring Expression Comprehension via Vison-Language True/False Verification
Zero-Shot Referring Expression Comprehension via Vison-Language True/False Verification
Jeffrey Liu
Rongbin Hu
ObjD
231
0
0
12 Sep 2025
KnowDR-REC: A Benchmark for Referring Expression Comprehension with Real-World Knowledge
KnowDR-REC: A Benchmark for Referring Expression Comprehension with Real-World Knowledge
Guanghao Jin
Jingpei Wu
Tianpei Guo
Yiyi Niu
Weidong Zhou
Guoyang Liu
183
1
0
12 Aug 2025
Multimodal Human-Intent Modeling for Contextual Robot-to-Human Handovers of Arbitrary Objects
Multimodal Human-Intent Modeling for Contextual Robot-to-Human Handovers of Arbitrary Objects
Lucas Chen
Guna Avula
Hanwen Ren
Zixing Wang
A. H. Qureshi
184
1
0
05 Aug 2025
Multimodal Referring Segmentation: A Survey
Multimodal Referring Segmentation: A Survey
Henghui Ding
Song Tang
Shuting He
Chang-rui Liu
Zuxuan Wu
Yu-Gang Jiang
512
16
0
01 Aug 2025
CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding
CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding
Fevziye Irem Eyiokur
Dogucan Yaman
H. K. Ekenel
Alexander Waibel
304
0
0
29 Jul 2025
Improving Contrastive Learning for Referring Expression Counting
Improving Contrastive Learning for Referring Expression Counting
Kostas Triaridis
Panagiotis Kaliosis
E-Ro Nguyen
Aoxiang Fan
Hieu M. Le
Dimitris Samaras
SSL
190
3
0
28 May 2025
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs
Zhehan Kan
Y. Liu
Kun Yin
Xinghua Jiang
Xin Li
...
Yinsong Liu
Shihong Deng
Xing Sun
Qingmin Liao
Wenming Yang
LRM
336
1
0
27 May 2025
Human-like compositional learning of visually-grounded concepts using synthetic environments
Human-like compositional learning of visually-grounded concepts using synthetic environments
Zijun Lin
M Ganesh Kumar
Cheston Tan
OCLCoGe
480
0
0
09 Apr 2025
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
Chong Chen
Zonghao Guo
Yang Li
Xiaoyi Feng
Maosong Sun
Maosong Sun
VLMLRM
382
20
0
17 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuhao Wang
Yongheng Shang
Yuxiang Cai
280
7
0
16 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
975
15
0
11 Mar 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
485
4
0
24 Feb 2025
Acknowledging Focus Ambiguity in Visual Questions
Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen
Yu-Yun Tseng
Zhuoheng Li
Anush Venkatesh
Danna Gurari
375
0
0
04 Jan 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
1.1K
41
0
28 Dec 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference
  Understanding
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
308
0
0
13 Nov 2024
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object
  Tracking and Segmentation
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
256
6
0
17 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring ModelingNeural Information Processing Systems (NeurIPS), 2024
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
496
28
0
10 Oct 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionInternational Conference on Learning Representations (ICLR), 2024
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
388
0
0
18 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through
  Expression-guided Dynamic Gating and Regression
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and RegressionIEEE transactions on multimedia (IEEE TMM), 2024
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
300
1
0
05 Sep 2024
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
Runwei Guan
Tao Huang
Liye Jia
Haocheng Zhao
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Eng Gee Lim
Jeremy S. Smith
Yutao Yue
449
9
0
30 Aug 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation
  Agents
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao-Yang Liu
Tianjie Zhang
Yu Gu
Iat Long Iong
Yifan Xu
...
Zhengxiao Du
Chan Hee Song
Yu Su
Yuxiao Dong
Jie Tang
VLMLLMAG
287
77
0
12 Aug 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of
  Large Multimodal Models
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
308
42
0
24 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances,
  and Future Directions
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future DirectionsIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
422
35
0
09 Jun 2024
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and
  Reasoning Chains
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains
Zhaohuan Zhan
Lisha Yu
Sijie Yu
Guang Tan
LLMAGLM&Ro
393
26
0
17 May 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual
  Grounding
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
374
41
0
20 Apr 2024
Referring Flexible Image Restoration
Referring Flexible Image Restoration
Runwei Guan
Rongsheng Hu
Zhuhao Zhou
Tianlang Xue
Ka Lok Man
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
246
0
0
16 Apr 2024
LocCa: Visual Pretraining with Location-aware Captioners
LocCa: Visual Pretraining with Location-aware Captioners
Bo Wan
Michael Tschannen
Yongqin Xian
Filip Pavetić
Ibrahim Alabdulmohsin
Xiao Wang
André Susano Pinto
Andreas Steiner
Lucas Beyer
Xiao-Qi Zhai
VLM
416
28
0
28 Mar 2024
J-CRe3: A Japanese Conversation Dataset for Real-world Reference
  Resolution
J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution
Nobuhiro Ueda
Hideko Habe
Yoko Matsui
Akishige Yuguchi
Seiya Kawano
Yasutomo Kawanishi
Sadao Kurohashi
Koichiro Yoshino
259
8
0
28 Mar 2024
Temporal-Spatial Object Relations Modeling for Vision-and-Language
  Navigation
Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation
Bowen Huang
Yanwei Zheng
Chuanlin Lan
Xinpeng Zhao
Yifei Zou
Dongxiao Yu
350
1
0
23 Mar 2024
MyVLM: Personalizing VLMs for User-Specific Queries
MyVLM: Personalizing VLMs for User-Specific Queries
Yuval Alaluf
Elad Richardson
Sergey Tulyakov
Kfir Aberman
Daniel Cohen-Or
MLLMVLM
442
54
0
21 Mar 2024
VL-Mamba: Exploring State Space Models for Multimodal Learning
VL-Mamba: Exploring State Space Models for Multimodal Learning
Yanyuan Qiao
Zheng Yu
Longteng Guo
Sihan Chen
Zijia Zhao
Mingzhen Sun
Qi Wu
Jing Liu
Mamba
283
117
0
20 Mar 2024
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and
  mmWave Radar
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Runwei Guan
Liye Jia
Fengyufan Yang
Shanliang Yao
Erick Purwanto
...
Eng Gee Lim
Jeremy S. Smith
Ka Lok Man
Xuming Hu
Yutao Yue
463
22
0
19 Mar 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
305
6
0
19 Feb 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Yuan Liu
VLMCLIP
494
183
0
06 Dec 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in ClutterConference on Robot Learning (CoRL), 2023
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
302
45
0
09 Nov 2023
Toloka Visual Question Answering Benchmark
Toloka Visual Question Answering Benchmark
Mert Pilanci
Nikita Pavlichenko
Sergey Koshelev
Daniil Likhobaba
Alisa Smirnova
279
7
0
28 Sep 2023
Dense Object Grounding in 3D Scenes
Dense Object Grounding in 3D ScenesACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
285
26
0
05 Sep 2023
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
SCoRD: Subject-Conditional Relation Detection with Text-Augmented DataIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ziyan Yang
Kushal Kafle
Zhe Lin
Scott D. Cohen
Zhihong Ding
Vicente Ordonez
291
1
0
24 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor ScenesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yuhao Lu
Yixuan Fan
Beixing Deng
Fan Liu
Yali Li
Shengjin Wang
297
64
0
01 Aug 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from
  Manipulation Instructions
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation InstructionsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
296
7
0
17 Jul 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual GroundingIEEE transactions on multimedia (IEEE TMM), 2023
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjDVLM
549
67
0
15 May 2023
Natural Language Robot Programming: NLP integrated with autonomous
  robotic grasping
Natural Language Robot Programming: NLP integrated with autonomous robotic grasping
Muhammad Arshad Khan
Max Kenney
Jack Painter
Disha Kamale
Riza Batista-Navarro
Amir M. Ghalamzan-E.
LM&Ro
176
4
0
06 Apr 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
  Understanding
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
209
12
0
23 Mar 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
375
62
0
01 Feb 2023
Find Someone Who: Visual Commonsense Understanding in Human-Centric
  Grounding
Find Someone Who: Visual Commonsense Understanding in Human-Centric GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Haoxuan You
Rui Sun
Zhecan Wang
Kai-Wei Chang
Shih-Fu Chang
178
7
0
14 Dec 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Extending Phrase Grounding with Pronouns in Visual DialoguesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
215
5
0
23 Oct 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
351
18
0
02 May 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
  One-Stage Referring Expression Comprehension
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionIEEE transactions on multimedia (IEEE TMM), 2022
Gen Luo
Weihao Ye
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
279
13
0
17 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
FindIt: Generalized Localization with Natural Language QueriesEuropean Conference on Computer Vision (ECCV), 2022
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
248
18
0
31 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Interactive Robotic Grasping with Attribute-Guided DisambiguationIEEE International Conference on Robotics and Automation (ICRA), 2022
Yang Yang
Xibai Lou
Changhyun Choi
224
38
0
15 Mar 2022
12
Next
Page 1 of 2