Shadow loss: Memory-linear deep metric learning for efficient training
Deep metric learning objectives (e.g., triplet loss) require storing and comparing high-dimensional embeddings, making the per-batch loss buffer scale as , where is the number of samples in a batch and is the feature dimension, thus limiting training on memory-constrained hardware. We propose Shadow Loss, a proxy-free, parameter-free objective that measures similarity via scalar projections onto the anchor direction, reducing the loss-specific buffer from to while preserving the triplet structure. We analyze gradients, provide a Lipschitz continuity bound, and show that Shadow Loss penalizes trivial collapse for stable optimization. Across fine-grained retrieval (CUB-200, CARS196), large-scale product retrieval (Stanford Online Products, In-Shop Clothes), and standard/medical benchmarks (CIFAR-10/100, Tiny-ImageNet, HAM-10K, ODIR-5K), Shadow Loss consistently outperforms recent objectives (Triplet, Soft-Margin Triplet, Angular Triplet, SoftTriple, Multi-Similarity). It also converges in fewer epochs under identical backbones and mining. Furthermore, it improves representation separability as measured by higher silhouette scores. The design is architecture-agnostic and vectorized for efficient implementation. By decoupling discriminative power from embedding dimensionality and reusing batch dot-products, Shadow Loss enables memory-linear training and faster convergence, making deep metric learning practical on both edge and large-scale systems.
View on arXiv