Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation

12 April 2024

Jose Dolz

Papers citing "Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation"

24 / 24 papers shown

Title
Register and CLS tokens yield a decoupling of local and global features in large ViTs Alexander Lappe M. Giese 14 0 0 09 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation Gabriele Rosi Fabio Cermelli VLM 25 0 0 06 May 2025
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation Yasser Benigmim Mohammad Fahes Tuan-Hung Vu Andrei Bursuc Raoul de Charette VLM 30 0 0 14 Apr 2025
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration Omar Alama A. Bhattacharya Haoyang He Seungchan Kim Yuheng Qiu Wenshan Wang Cherie Ho Nikhil Varma Keetha Sebastian A. Scherer 26 0 0 09 Apr 2025
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation Ting Liu Siyuan Li 36 0 0 01 Apr 2025
Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation Niccolo Avogaro Thomas Frick Mattia Rigotti A. Bartezzaghi Filip Janicki C. Malossi Konrad Schindler Roy Assaf MLLM VLM 53 0 0 25 Mar 2025
Towards Training-free Anomaly Detection with Vision and Language Foundation Models Jinjin Zhang Guodong Wang Yizhou Jin Di Huang 42 1 0 24 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs Mir Rayat Imtiaz Hossain Mennatullah Siam Leonid Sigal James J. Little MLLM VLM 63 0 0 13 Mar 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? Mennatullah Siam VLM 76 1 0 06 Feb 2025
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection Xiangyu Gao Yu Dai Benliu Qiu Hongliang Li Heqian Qiu Hongliang Li ObjD VLM 56 0 0 28 Jan 2025
TeD-Loc: Text Distillation for Weakly Supervised Object Localization Shakeeb Murtaza Soufiane Belharbi M. Pedersoli Eric Granger WSOL VLM 84 1 0 22 Jan 2025
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation Luca Barsellotti Lorenzo Bianchi Nicola Messina F. Carrara Marcella Cornia Lorenzo Baraldi Fabrizio Falchi Rita Cucchiara VLM 62 2 0 28 Nov 2024
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation Chanyoung Kim Dayun Ju Woojung Han Ming-Hsuan Yang Seong Jae Hwang VLM VOS 66 0 0 26 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference Yuhang Yang Jinhong Deng Wen Li Lixin Duan VLM 68 0 0 24 Nov 2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation Sule Bai Yong-Jin Liu Yifei Han Haoji Zhang Yansong Tang VLM 72 3 0 24 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements M. Arda Aydın Efe Mert Çırpar Elvin Abdinli Gözde B. Ünal Y. Sahin VLM 59 0 0 18 Nov 2024
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation Yuheng Shi Minjing Dong Chang Xu VLM 24 1 0 14 Nov 2024
Multilingual Vision-Language Pre-training for the Remote Sensing Domain João Daniel Silva João Magalhães D. Tuia Bruno Martins CLIP VLM 25 1 0 30 Oct 2024
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers Andrew F. Luo Jacob Yeung Rushikesh Zawar Shaurya Dewan Margaret M. Henderson Leila Wehbe Michael J. Tarr 21 3 0 07 Oct 2024
Image Segmentation in Foundation Model Era: A Survey Tianfei Zhou Fei Zhang Boyu Chang Wenguan Wang Ye Yuan E. Konukoglu Daniel Cremers VLM 38 4 0 23 Aug 2024
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models Jiarui Xu Sifei Liu Arash Vahdat Wonmin Byeon Xiaolong Wang Shalini De Mello VLM 198 318 0 08 Mar 2023
GroupViT: Semantic Segmentation Emerges from Text Supervision Jiarui Xu Shalini De Mello Sifei Liu Wonmin Byeon Thomas Breuel Jan Kautz X. Wang ViT VLM 175 494 0 22 Feb 2022
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling Renrui Zhang Rongyao Fang Wei Zhang Peng Gao Kunchang Li Jifeng Dai Yu Qiao Hongsheng Li VLM 172 281 0 06 Nov 2021
Semantic Understanding of Scenes through the ADE20K Dataset Bolei Zhou Hang Zhao Xavier Puig Tete Xiao Sanja Fidler Adela Barriuso Antonio Torralba SSeg 243 1,817 0 18 Aug 2016