Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

Understanding the perceptual invariances of artificial neural networks is essential for improving explainability and aligning models with human vision. Metamers - stimuli that are physically distinct yet produce identical neural activations - serve as a valuable tool for investigating these invariances. We introduce a novel approach to metamer generation by leveraging ensembles of artificial neural networks, capturing shared representational subspaces across diverse architectures, including convolutional neural networks and vision transformers. To characterize the properties of the generated metamers, we employ a suite of image-based metrics that assess factors such as semantic fidelity and naturalness. Our findings show that convolutional neural networks generate more recognizable and human-like metamers, while vision transformers produce realistic but less transferable metamers, highlighting the impact of architectural biases on representational invariances.
View on arXiv@article{boehm2025_2504.01739, title={ Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers }, author={ Lukas Boehm and Jonas Leo Mueller and Christoffer Loeffler and Leo Schwinn and Bjoern Eskofier and Dario Zanca }, journal={arXiv preprint arXiv:2504.01739}, year={ 2025 } }