ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.00205
  4. Cited By
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding

Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding

IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
31 July 2021
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
    ObjD
ArXiv (abs)PDFHTML

Papers citing "Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding"

14 / 14 papers shown
Title
LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression Comprehension
LIHE: Linguistic Instance-Split Hyperbolic-Euclidean Framework for Generalized Weakly-Supervised Referring Expression ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
X. Shi
Silin Cheng
Sirui Zhao
Yunhan Jiang
Enhong Chen
Yang Liu
Sebastien Ourselin
128
0
0
15 Nov 2025
A Simple and Better Baseline for Visual Grounding
A Simple and Better Baseline for Visual Grounding
Jingchao Wang
Wenlong Zhang
Dingjiang Huang
Hong Wang
Yefeng Zheng
ObjD
89
0
0
12 Oct 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
290
4
0
24 Feb 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
743
26
0
28 Dec 2024
Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive
  Position Correction for Visual Grounding
Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual GroundingIEEE transactions on multimedia (IEEE TMM), 2024
Minghong Xie
Ming Wang
Huafeng Li
Yafei Zhang
Dapeng Tao
Z. Yu
ObjD
134
5
0
31 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring ModelingNeural Information Processing Systems (NeurIPS), 2024
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
350
20
0
10 Oct 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
215
15
0
26 Jun 2024
Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models
Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models
Konstantinos Vilouras
Pedro Sanchez
Alison Q. OÑeil
Sotirios A. Tsaftaris
MedIm
473
9
0
19 Apr 2024
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Wei Ji
Li Li
Marco Pleines
Xiangyan Liu
Xu Yang
Juncheng Billy Li
Roger Zimmermann
138
12
0
29 Sep 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Language Adaptive Weight Generation for Multi-task Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
244
48
0
06 Jun 2023
ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of
  Pneumothorax
ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax
Zachary Huemann
Xin Tie
Junjie Hu
Tyler Bradshaw
147
26
0
02 Mar 2023
Transformer-based Generative Adversarial Networks in Computer Vision: A
  Comprehensive Survey
Transformer-based Generative Adversarial Networks in Computer Vision: A Comprehensive SurveyIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
S. Dubey
Satish Kumar Singh
ViT
203
54
0
17 Feb 2023
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring
  Expression Comprehension
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression ComprehensionIEEE Transactions on Image Processing (IEEE TIP), 2022
Peihan Miao
Wei Su
Gaoang Wang
Xuewei Li
Xi Li
ObjD
242
12
0
21 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
  One-Stage Referring Expression Comprehension
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionIEEE transactions on multimedia (IEEE TMM), 2022
Gen Luo
Weihao Ye
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
179
13
0
17 Apr 2022
1