Extending CLIP's Image-Text Alignment to Referring Image Segmentation

v1v2 (latest)

Extending CLIP's Image-Text Alignment to Referring Image Segmentation

North American Chapter of the Association for Computational Linguistics (NAACL), 2023

14 June 2023

ArXiv (abs)PDF HTML

Papers citing "Extending CLIP's Image-Text Alignment to Referring Image Segmentation"

14 / 14 papers shown

Title
SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation Zhenjie Mao Yuhuan Yang Chaofan Ma Dongsheng Jiang Jiangchao Yao Ya Zhang Yanfeng Wang 80 0 0 11 Oct 2025
Holistic Order Prediction in Natural Scenes Pierre Musacchio Hyunmin Lee Jaesik Park 3DV 215 0 0 02 Oct 2025
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP Na Min An Inha Kang Minhyun Lee Hyunjung Shim VLM 97 0 0 27 Sep 2025
Latent Expression Generation for Referring Image Segmentation and Grounding S. Yu Joonbeom Hong Joonseok Lee Jeany Son ObjD 133 1 0 07 Aug 2025
Multimodal Referring Segmentation: A Survey Henghui Ding Song Tang Shuting He Chang-rui Liu Zuxuan Wu Yu-Gang Jiang 298 10 0 01 Aug 2025
DGTRSD & DGTRS-CLIP: A Dual-Granularity Remote Sensing Image-Text Dataset and Vision Language Foundation Model for AlignmentIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025 Weizhi Chen Yupeng Deng Jin Wei Jingbo Chen Jiansheng Chen Yuman Feng Zhihao Xi Diyou Liu Kai Li Yu Meng VLM 239 2 0 25 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2025 Seil Kang Jinyeong Kim Junhyeok Kim Seong Jae Hwang VLM 240 27 0 08 Mar 2025
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 Linhui Xiao Xiaoshan Yang X. Lan Yaowei Wang Changsheng Xu ObjD 763 26 0 28 Dec 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring ModelingNeural Information Processing Systems (NeurIPS), 2024 Linhui Xiao Xiaoshan Yang Fang Peng Yaowei Wang Changsheng Xu ObjD 362 20 0 10 Oct 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras Pratik K. Mishra Irene Ballester Andrea Iaboni Bing Ye Kristine Newman Alex Mihailidis Shehroz S. Khan 185 2 0 28 Aug 2024
Image Segmentation in Foundation Model Era: A Survey Tianfei Zhou Fei Zhang Boyu Chang Wenguan Wang Ye Yuan E. Konukoglu Daniel Cremers VLM 339 27 0 23 Aug 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation Seonghoon Yu Paul Hongsuck Seo Jeany Son DiffM 324 11 0 10 Jul 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding Linhui Xiao Xiaoshan Yang Fang Peng Yaowei Wang Changsheng Xu ObjD 275 30 0 20 Apr 2024
Putting 3D Spatially Sparse Networks on a Diet Junha Lee Chris Choy Jaesik Park 3DV 207 3 0 02 Dec 2021