ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.03508
  4. Cited By
Rethinking Diversified and Discriminative Proposal Generation for Visual
  Grounding

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding

9 May 2018
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
    ObjD
ArXiv (abs)PDFHTML

Papers citing "Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding"

50 / 71 papers shown
Improving Generalized Visual Grounding with Instance-aware Joint Learning
Improving Generalized Visual Grounding with Instance-aware Joint LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Ming Dai
Wenxuan Cheng
Jiang-Jiang Liu
Lingfeng Yang
Zhenhua Feng
Wankou Yang
Jingdong Wang
ObjDISeg
255
4
0
17 Sep 2025
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Jiangnan Xie
Xiaolong Zheng
Liang Zheng
ObjD
169
0
0
08 Sep 2025
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai
Wenxuan Cheng
Jiedong Zhuang
Jiang-Jiang Liu
Hongshen Zhao
Zhenhua Feng
Wankou Yang
ObjD
229
3
0
05 Sep 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
963
31
0
28 Dec 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-TrainingACM Multimedia (ACM MM), 2022
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
377
9
0
16 Oct 2024
ResVG: Enhancing Relation and Semantic Understanding in Multiple
  Instances for Visual Grounding
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual GroundingACM Multimedia (MM), 2024
Minghang Zheng
Jiahua Zhang
Qingchao Chen
Yuxin Peng
Yang Liu
ObjD
297
5
0
29 Aug 2024
R2G: Reasoning to Ground in 3D Scenes
R2G: Reasoning to Ground in 3D ScenesPattern Recognition (Pattern Recogn.), 2024
Yixuan Li
Zan Wang
Wei Liang
309
2
0
24 Aug 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual
  Grounding
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
408
19
0
03 Jul 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
276
15
0
26 Jun 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Fan Yang
Jinqiao Wang
Jinqiao Wang
ObjD
294
26
0
14 Mar 2024
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity
  Recognition
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
Jinyuan Li
Han Li
Di Sun
Jiahao Wang
Wenkun Zhang
Zan Wang
Gang Pan
394
17
0
15 Feb 2024
Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal
  Distillation
Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal DistillationChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Jiaxi Wang
Wenhui Hu
Xueyang Liu
Beihu Wu
Yuting Qiu
Yingying Cai
279
1
0
29 Dec 2023
Context Disentangling and Prototype Inheriting for Robust Visual
  Grounding
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
271
41
0
19 Dec 2023
Mono3DVG: 3D Visual Grounding in Monocular Images
Mono3DVG: 3D Visual Grounding in Monocular ImagesAAAI Conference on Artificial Intelligence (AAAI), 2023
Yangfan Zhan
Yuan. Yuan
Zhitong Xiong
MDE
266
34
0
13 Dec 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large
  Language Models
Griffon: Spelling out All Object Locations at Any Granularity with Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2023
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
241
30
0
24 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation
  for Grounding-Based Vision and Language Models
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
283
2
0
05 Nov 2023
Language-Guided Diffusion Model for Visual Grounding
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
638
6
0
18 Aug 2023
Enhancing image captioning with depth information using a
  Transformer-based framework
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
215
5
0
24 Jul 2023
Incomplete Multi-view Clustering via Prototype-based Imputation
Incomplete Multi-view Clustering via Prototype-based ImputationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Hao Li
Yunfan Li
Mouxing Yang
Peng Hu
Dezhong Peng
Xiaocui Peng
222
65
0
26 Jan 2023
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial
  Images
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial ImagesIsprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023
Kun Li
G. Vosselman
M. Yang
219
17
0
23 Jan 2023
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and
  Grounding
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and GroundingAAAI Conference on Artificial Intelligence (AAAI), 2022
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
283
39
0
28 Nov 2022
Who are you referring to? Coreference resolution in image narrations
Who are you referring to? Coreference resolution in image narrationsIEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
272
5
0
26 Nov 2022
YORO -- Lightweight End to End Visual Grounding
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
172
27
0
15 Nov 2022
Grounding Scene Graphs on Natural Images via Visio-Lingual Message
  Passing
Grounding Scene Graphs on Natural Images via Visio-Lingual Message PassingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Aditay Tripathi
Anand Mishra
Anirban Chakraborty
164
3
0
03 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
241
179
0
23 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
354
38
0
05 Oct 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual
  Grounding
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual GroundingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
226
49
0
28 Sep 2022
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems
Taehyeong Kim
H. Song
Byoung-Tak Zhang
196
5
0
31 Jul 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
  Transformer
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
240
89
0
14 Jun 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for
  Vision-language Models
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLMMLLM
252
43
0
23 May 2022
Improving Visual Grounding with Visual-Linguistic Verification and
  Iterative Reasoning
Improving Visual Grounding with Visual-Linguistic Verification and Iterative ReasoningComputer Vision and Pattern Recognition (CVPR), 2022
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
289
155
0
30 Apr 2022
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring
  Expression Comprehension
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression ComprehensionIEEE Transactions on Image Processing (IEEE TIP), 2022
Peihan Miao
Wei Su
Gaoang Wang
Xuewei Li
Xi Li
ObjD
333
13
0
21 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
  One-Stage Referring Expression Comprehension
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionIEEE transactions on multimedia (IEEE TMM), 2022
Gen Luo
Weihao Ye
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
243
13
0
17 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language TasksIEEE Transactions on Image Processing (IEEE TIP), 2022
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
150
57
0
16 Apr 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Pseudo-Q: Generating Pseudo Language Queries for Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2022
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
312
65
0
16 Mar 2022
Suspected Object Matters: Rethinking Model's Prediction for One-stage
  Visual Grounding
Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual GroundingACM Multimedia (ACM MM), 2022
Yang Jiao
Zequn Jie
Yue Yu
Lin Ma
Yu-Gang Jiang
OOD
221
9
0
10 Mar 2022
Bottom Up Top Down Detection Transformers for Language Grounding in
  Images and Point Clouds
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds
Ayush Jain
N. Gkanatsios
Ishita Mediratta
Katerina Fragkiadaki
ObjD
477
147
0
16 Dec 2021
Towards Language-guided Visual Recognition via Dynamic Convolutions
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yongjian Wu
Yue Gao
Rongrong Ji
ObjD
234
27
0
17 Oct 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
166
58
0
16 Aug 2021
Sharing Cognition: Human Gesture and Natural Language Grounding Based
  Planning and Navigation for Indoor Robots
Sharing Cognition: Human Gesture and Natural Language Grounding Based Planning and Navigation for Indoor Robots
Gourav Kumar
Soumyadip Maity
R. Roychoudhury
Brojeshwar Bhowmick
LM&Ro
103
2
0
14 Aug 2021
A Better Loss for Visual-Textual Grounding
A Better Loss for Visual-Textual GroundingACM Symposium on Applied Computing (SAC), 2021
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
175
3
0
11 Aug 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual GroundingNeural Information Processing Systems (NeurIPS), 2021
Muchen Li
Leonid Sigal
ObjD
323
237
0
06 Jun 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language
  Matching
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
273
5
0
12 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingIEEE International Conference on Computer Vision (ICCV), 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjDVLM
612
1,051
0
26 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with TransformersIEEE International Conference on Computer Vision (ICCV), 2021
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
612
442
0
17 Apr 2021
Look Before You Leap: Learning Landmark Features for One-Stage Visual
  Grounding
Look Before You Leap: Learning Landmark Features for One-Stage Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2021
Binbin Huang
Dongze Lian
Weixin Luo
Shenghua Gao
ObjD
322
123
0
09 Apr 2021
Relation-aware Instance Refinement for Weakly Supervised Visual
  Grounding
Relation-aware Instance Refinement for Weakly Supervised Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2021
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
255
65
0
24 Mar 2021
SIRI: Spatial Relation Induced Network For Spatial Description
  Resolution
SIRI: Spatial Relation Induced Network For Spatial Description ResolutionNeural Information Processing Systems (NeurIPS), 2020
Peiyao Wang
Weixin Luo
Yanyu Xu
Haojie Li
Shugong Xu
Jianyu Yang
Shenghua Gao
102
0
0
27 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase
  Grounding
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
292
14
0
12 Oct 2020
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression
  Grounding
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression GroundingAAAI Conference on Artificial Intelligence (AAAI), 2020
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
311
111
0
03 Sep 2020
12
Next