Sharpness-Aware Minimization Can Hallucinate Minimizers
Sharpness-Aware Minimization (SAM) is widely used to seek flatter minima -- often linked to better generalization. In its standard implementation, SAM updates the current iterate using the loss gradient evaluated at a point perturbed by distance along the normalized gradient direction. We show that, for some choices of , SAM can stall at points where this shifted (perturbed-point) gradient vanishes despite a nonzero original gradient, and therefore, they are not stationary points of the original loss. We call these points hallucinated minimizers, prove their existence under simple nonconvex landscape conditions (e.g., the presence of a local minimizer and a local maximizer), and establish sufficient conditions for local convergence of the SAM iterates to them. We corroborate this failure mode in neural network training and observe that it aligns with SAM's performance degradation often seen at large . Finally, as a practical safeguard, we find that a short initial SGD warm-start before enabling SAM mitigates this failure mode and reduces sensitivity to the choice of .
View on arXiv