Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.00272
Cited By
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
30 April 2022
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning"
50 / 58 papers shown
Title
Efficient Adaptation For Remote Sensing Visual Grounding
Hasan Moughnieh
Mohamad Chalhoub
Hasan Nasrallah
Cristiano Nattero
Paolo Campanella
Giovanni Nico
A. Ghandour
51
0
0
29 Mar 2025
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
50
0
0
19 Mar 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
51
0
0
24 Feb 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
65
2
0
03 Jan 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
Yaxian Wang
Henghui Ding
Shuting He
Xudong Jiang
Bifan Wei
Jun Liu
ObjD
47
1
0
03 Jan 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
152
0
0
01 Dec 2024
Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding
Minghong Xie
Hao Wu
Huafeng Li
Yafei Zhang
Dapeng Tao
Z. Yu
ObjD
40
1
0
31 Oct 2024
Context-Infused Visual Grounding for Art
Selina Khan
Nanne van Noord
ObjD
35
1
0
16 Oct 2024
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai
Lingfeng Yang
Yihao Xu
Zhenhua Feng
Wankou Yang
ObjD
27
9
0
26 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
46
3
0
16 Sep 2024
NeIn: Telling What You Don't Want
Nhat-Tan Bui
Dinh-Hieu Hoang
Quoc-Huy Trinh
Minh-Triet Tran
Truong Nguyen
Susan Gauch
43
2
0
09 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
33
0
0
05 Sep 2024
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding
Minghang Zheng
Jiahua Zhang
Qingchao Chen
Yuxin Peng
Yang Liu
ObjD
29
2
0
29 Aug 2024
Disentangle and denoise: Tackling context misalignment for video moment retrieval
Kaijing Ma
Han Fang
Xianghao Zang
Chao Ban
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
Zerun Feng
Xingsong Hou
60
1
0
14 Aug 2024
Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)
Bin Han
Yiwei Yang
A. Caspi
Bill Howe
VLM
24
1
0
01 Aug 2024
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang
Mengxue Qu
Yunchao Wei
Yan Yan
38
6
0
03 Jul 2024
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
45
4
0
03 Jul 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
33
9
0
03 Jul 2024
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
Shenghuan Sun
Gregory M. Goldgof
Alexander Schubert
Zhiqing Sun
Thomas Hartvigsen
A. Butte
Ahmed Alaa
LM&MA
42
4
0
29 May 2024
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All
Yuanhuiyi Lyu
Xueye Zheng
Dahun Kim
Lin Wang
51
13
0
25 May 2024
MLS-Track: Multilevel Semantic Interaction in RMOT
Zeliang Ma
Yang Song
Zhe Cui
Zhicheng Zhao
Fei Su
Delong Liu
Jingyu Wang
36
3
0
18 Apr 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue
Jie Cheng
Longteng Guo
Xingyuan Dai
Zijia Zhao
Xingjian He
Gang Xiong
Yisheng Lv
Jing Liu
43
9
0
20 Mar 2024
Adversarial Testing for Visual Grounding via Image-Aware Property Reduction
Zhiyuan Chang
Mingyang Li
Junjie Wang
Cheng Li
Boyu Wu
Fanjiang Xu
Qing Wang
AAML
36
0
0
02 Mar 2024
How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding
Jiamin Luo
Jianing Zhao
Jingjing Wang
Guodong Zhou
46
0
0
29 Feb 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
24
88
0
30 Jan 2024
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Yinchao Ma
Yuyang Tang
Wenfei Yang
Tianzhu Zhang
Jinpeng Zhang
Mengxue Kang
ObjD
18
12
0
20 Jan 2024
Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation
Jiaxi Wang
Wenhui Hu
Xueyang Liu
Beihu Wu
Yuting Qiu
Yingying Cai
15
0
0
29 Dec 2023
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
42
7
0
23 Dec 2023
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
38
24
0
19 Dec 2023
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
56
4
0
15 Dec 2023
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan
Yuan. Yuan
Zhitong Xiong
MDE
36
9
0
13 Dec 2023
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models
Haicheng Liao
Huanming Shen
Zhenning Li
Chengyue Wang
Guofa Li
Yiming Bie
Chengzhong Xu
36
50
0
06 Dec 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
34
8
0
24 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjD
VLM
26
9
0
22 Oct 2023
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
Dongwon Kim
Nam-Won Kim
Cuiling Lan
Suha Kwak
VLM
42
19
0
29 Aug 2023
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
37
5
0
18 Aug 2023
Grounded Image Text Matching with Mismatched Relation Reasoning
Yu Wu
Yan-Tao Wei
Haozhe Jasper Wang
Yongfei Liu
Sibei Yang
Xuming He
31
6
0
02 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
Yuhao Lu
Yixuan Fan
Beixing Deng
F. Liu
Yali Li
Shengjin Wang
38
28
0
01 Aug 2023
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
3DPC
32
20
0
25 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
28
5
0
23 Jul 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
23
3
0
13 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
23
0
0
02 Jun 2023
Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis
Pin-Er Chen
Po-Ya Angela Wang
Hsin-Yu Chou
Yu-Hsiang Tseng
S. Hsieh
23
1
0
24 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
21
71
0
09 May 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
46
7
0
23 Mar 2023
Joint Visual Grounding and Tracking with Natural Language Specification
Li Zhou
Zikun Zhou
Kaige Mao
Zhenyu He
30
59
0
21 Mar 2023
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
50
25
0
28 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
74
106
0
23 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
53
63
0
29 Sep 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
27
23
0
28 Sep 2022
1
2
Next