Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2108.00205
Cited By
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
31 July 2021
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding"
14 / 14 papers shown
Title
LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
X. Shi
Silin Cheng
Sirui Zhao
Yunhan Jiang
Enhong Chen
Yang Liu
Sebastien Ourselin
128
0
0
15 Nov 2025
A Simple and Better Baseline for Visual Grounding
Jingchao Wang
Wenlong Zhang
Dingjiang Huang
Hong Wang
Yefeng Zheng
ObjD
89
0
0
12 Oct 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
290
4
0
24 Feb 2025
Towards Visual Grounding: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
743
26
0
28 Dec 2024
Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding
IEEE transactions on multimedia (IEEE TMM), 2024
Minghong Xie
Ming Wang
Huafeng Li
Yafei Zhang
Dapeng Tao
Z. Yu
ObjD
134
5
0
31 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Neural Information Processing Systems (NeurIPS), 2024
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
350
20
0
10 Oct 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
215
15
0
26 Jun 2024
Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models
Konstantinos Vilouras
Pedro Sanchez
Alison Q. OÑeil
Sotirios A. Tsaftaris
MedIm
473
9
0
19 Apr 2024
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Wei Ji
Li Li
Marco Pleines
Xiangyan Liu
Xu Yang
Juncheng Billy Li
Roger Zimmermann
138
12
0
29 Sep 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
244
48
0
06 Jun 2023
ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax
Zachary Huemann
Xin Tie
Junjie Hu
Tyler Bradshaw
147
26
0
02 Mar 2023
Transformer-based Generative Adversarial Networks in Computer Vision: A Comprehensive Survey
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
S. Dubey
Satish Kumar Singh
ViT
203
54
0
17 Feb 2023
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension
IEEE Transactions on Image Processing (IEEE TIP), 2022
Peihan Miao
Wei Su
Gaoang Wang
Xuewei Li
Xi Li
ObjD
242
12
0
21 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
IEEE transactions on multimedia (IEEE TMM), 2022
Gen Luo
Weihao Ye
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
179
13
0
17 Apr 2022
1