Unlocking Open-Set Language Accessibility in Vision Models

14 March 2025

Abstract

Visual classifiers offer high-dimensional feature representations that are challenging to interpret and analyze. Text, in contrast, provides a more expressive and human-friendly interpretable medium for understanding and analyzing model behavior. We propose a simple, yet powerful method for reformulating any visual classifier so that it can be accessed with open-set text queries without compromising its original performance. Our approach is label-free, efficient, and preserves the underlying classifier's distribution and reasoning processes. We thus unlock several text-based interpretability applications for any classifier. We apply our method on 40 visual classifiers and demonstrate two primary applications: 1) building both label-free and zero-shot concept bottleneck models and therefore converting any classifier to be inherently-interpretable and 2) zero-shot decoding of visual features into natural language. In both applications, we achieve state-of-the-art results, greatly outperforming existing works. Our method enables text approaches for interpreting visual classifiers.

View on arXiv

@article{sammani2025_2503.10981,
  title={ Unlocking Open-Set Language Accessibility in Vision Models },
  author={ Fawaz Sammani and Jonas Fischer and Nikos Deligiannis },
  journal={arXiv preprint arXiv:2503.10981},
  year={ 2025 }
}

Comments on this paper