Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1805.03508
Cited By
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
9 May 2018
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding"
50 / 71 papers shown
Improving Generalized Visual Grounding with Instance-aware Joint Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Ming Dai
Wenxuan Cheng
Jiang-Jiang Liu
Lingfeng Yang
Zhenhua Feng
Wankou Yang
Jingdong Wang
ObjD
ISeg
255
4
0
17 Sep 2025
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Jiangnan Xie
Xiaolong Zheng
Liang Zheng
ObjD
170
0
0
08 Sep 2025
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai
Wenxuan Cheng
Jiedong Zhuang
Jiang-Jiang Liu
Hongshen Zhao
Zhenhua Feng
Wankou Yang
ObjD
229
3
0
05 Sep 2025
Towards Visual Grounding: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
963
31
0
28 Dec 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
ACM Multimedia (ACM MM), 2022
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
377
9
0
16 Oct 2024
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding
ACM Multimedia (MM), 2024
Minghang Zheng
Jiahua Zhang
Qingchao Chen
Yuxin Peng
Yang Liu
ObjD
297
5
0
29 Aug 2024
R2G: Reasoning to Ground in 3D Scenes
Pattern Recognition (Pattern Recogn.), 2024
Yixuan Li
Zan Wang
Wei Liang
309
2
0
24 Aug 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
409
19
0
03 Jul 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
278
15
0
26 Jun 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Fan Yang
Jinqiao Wang
Jinqiao Wang
ObjD
294
26
0
14 Mar 2024
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
Jinyuan Li
Han Li
Di Sun
Jiahao Wang
Wenkun Zhang
Zan Wang
Gang Pan
395
17
0
15 Feb 2024
Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Jiaxi Wang
Wenhui Hu
Xueyang Liu
Beihu Wu
Yuting Qiu
Yingying Cai
280
1
0
29 Dec 2023
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
271
41
0
19 Dec 2023
Mono3DVG: 3D Visual Grounding in Monocular Images
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yangfan Zhan
Yuan. Yuan
Zhitong Xiong
MDE
266
34
0
13 Dec 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
European Conference on Computer Vision (ECCV), 2023
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
242
30
0
24 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
283
2
0
05 Nov 2023
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
638
6
0
18 Aug 2023
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
215
5
0
24 Jul 2023
Incomplete Multi-view Clustering via Prototype-based Imputation
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Hao Li
Yunfan Li
Mouxing Yang
Peng Hu
Dezhong Peng
Xiaocui Peng
222
65
0
26 Jan 2023
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images
Isprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023
Kun Li
G. Vosselman
M. Yang
219
17
0
23 Jan 2023
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2022
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
283
39
0
28 Nov 2022
Who are you referring to? Coreference resolution in image narrations
IEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
272
5
0
26 Nov 2022
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
172
27
0
15 Nov 2022
Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Aditay Tripathi
Anand Mishra
Anirban Chakraborty
164
3
0
03 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
241
179
0
23 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
354
38
0
05 Oct 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
226
49
0
28 Sep 2022
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems
Taehyeong Kim
H. Song
Byoung-Tak Zhang
196
5
0
31 Jul 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
240
89
0
14 Jun 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
252
43
0
23 May 2022
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
Computer Vision and Pattern Recognition (CVPR), 2022
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
292
155
0
30 Apr 2022
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension
IEEE Transactions on Image Processing (IEEE TIP), 2022
Peihan Miao
Wei Su
Gaoang Wang
Xuewei Li
Xi Li
ObjD
333
13
0
21 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
IEEE transactions on multimedia (IEEE TMM), 2022
Gen Luo
Weihao Ye
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
243
13
0
17 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
IEEE Transactions on Image Processing (IEEE TIP), 2022
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
153
57
0
16 Apr 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2022
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
312
65
0
16 Mar 2022
Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding
ACM Multimedia (ACM MM), 2022
Yang Jiao
Zequn Jie
Yue Yu
Lin Ma
Yu-Gang Jiang
OOD
227
9
0
10 Mar 2022
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds
Ayush Jain
N. Gkanatsios
Ishita Mediratta
Katerina Fragkiadaki
ObjD
477
147
0
16 Dec 2021
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yongjian Wu
Yue Gao
Rongrong Ji
ObjD
240
27
0
17 Oct 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
166
58
0
16 Aug 2021
Sharing Cognition: Human Gesture and Natural Language Grounding Based Planning and Navigation for Indoor Robots
Gourav Kumar
Soumyadip Maity
R. Roychoudhury
Brojeshwar Bhowmick
LM&Ro
103
2
0
14 Aug 2021
A Better Loss for Visual-Textual Grounding
ACM Symposium on Applied Computing (SAC), 2021
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
175
3
0
11 Aug 2021
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Neural Information Processing Systems (NeurIPS), 2021
Muchen Li
Leonid Sigal
ObjD
329
237
0
06 Jun 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
273
5
0
12 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
612
1,051
0
26 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
612
442
0
17 Apr 2021
Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2021
Binbin Huang
Dongze Lian
Weixin Luo
Shenghua Gao
ObjD
322
123
0
09 Apr 2021
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2021
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
255
65
0
24 Mar 2021
SIRI: Spatial Relation Induced Network For Spatial Description Resolution
Neural Information Processing Systems (NeurIPS), 2020
Peiyao Wang
Weixin Luo
Yanyu Xu
Haojie Li
Shugong Xu
Jianyu Yang
Shenghua Gao
102
0
0
27 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
294
14
0
12 Oct 2020
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2020
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
315
111
0
03 Sep 2020
1
2
Next