CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

In this paper, we propose a simple yet effective contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency. Conventional knowledge distillation approaches exhibit over-reliance on feature similarity per sample, which risks overfitting, and contrastive approaches focus on inter-class discrimination at the expense of intra-sample semantic relationships. Our approach transfers "dark knowledge" through teacher-student contrastive alignment at the sample level. Specifically, our method first enforces intra-sample alignment by directly minimizing teacher-student logit discrepancies within individual samples. Then, we utilize inter-sample contrasts to preserve semantic dissimilarities across samples. By redefining positive pairs as aligned teacher-student logits from identical samples and negative pairs as cross-sample logit combinations, we reformulate these dual constraints into an InfoNCE loss framework, reducing computational complexity lower than sample squares while eliminating dependencies on temperature parameters and large batch sizes. We conduct comprehensive experiments across three benchmark datasets, including the CIFAR-100, ImageNet-1K, and MS COCO datasets, and experimental results clearly confirm the effectiveness of the proposed method on image classification, object detection, and instance segmentation tasks.
View on arXiv@article{zhu2025_2404.14109, title={ CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective }, author={ Wencheng Zhu and Xin Zhou and Pengfei Zhu and Yu Wang and Qinghua Hu }, journal={arXiv preprint arXiv:2404.14109}, year={ 2025 } }