LookSharp: Attention Entropy Minimization for Test-Time Adaptation
Evan Shelhamer
- OOD
Main:4 Pages
3 Figures
Bibliography:1 Pages
1 Tables
Appendix:2 Pages
Abstract
Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. While entropy minimization over the output distribution has proven effective as a TTA loss, we study using the intermediate distributions computed by transformers in the attention mechanism. We propose LookSharp, which minimizes the entropy of CLS-to-patch attention in the final layer as a novel TTA objective, encouraging the model to maintain focused attention on shifted data. We demonstrate that attention entropy minimization improves robustness on ImageNet-C. We also show that it is complementary to output entropy minimization and maintains performance on clean data.
View on arXivComments on this paper
