448
v1v2 (latest)

Dataset Distillation via Committee Voting

Main:8 Pages
18 Figures
Bibliography:2 Pages
25 Tables
Appendix:18 Pages
Abstract

Dataset distillation aims to synthesize a compact yet representative dataset that preserves the essential characteristics of the original data for efficient model training. Existing methods mainly focus on improving data-synthetic alignment or scaling distillation to large datasets. In this work, we propose C\textbf{C}ommittee V\textbf{V}oting for D\textbf{D}ataset D\textbf{D}istillation (CV-DD\textbf{CV-DD}), an orthogonal approach that leverages the collective knowledge of multiple models to produce higher-quality distilled data. We first establish a strong baseline that achieves state-of-the-art performance through modern architectural and optimization choices. By integrating distributions and predictions from multiple models and generating high-quality soft labels, our method captures a broader range of data characteristics, reduces model-specific bias and the impact of distribution shifts, and significantly improves generalization. This voting-based strategy enhances diversity and robustness, alleviates overfitting, and improves post-evaluation performance. Extensive experiments across multiple datasets and IPC settings demonstrate that CV-DD consistently outperforms single- and multi-model distillation methods and generalizes well to non-training-based frameworks and challenging synthetic-to-real transfer tasks. Code is available at:this https URL.

View on arXiv
Comments on this paper