SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

7 October 2024

Abstract

To acquire instruction-following capabilities, large language models (LLMs) undergo instruction tuning, where they are trained on instruction-response pairs using next-token prediction (NTP). Efforts to improve instruction tuning often focus on higher-quality supervised fine-tuning (SFT) datasets, typically requiring data filtering with proprietary LLMs or human annotation. In this paper, we take a different approach by proposing SFTMix, a novel Mixup-based recipe that elevates LLM instruction tuning beyond the conventional NTP paradigm, without relying on well-curated datasets. Observing that LLMs exhibit uneven confidence across the semantic representation space, we argue that examples with different confidence levels should play distinct roles in instruction tuning--confident data is prone to overfitting, while unconfident data is harder to generalize. Based on this insight, SFTMix leverages training dynamics to identify examples with varying confidence levels, interpolates them to bridge the confidence gap, and applies a Mixup-based regularization to support learning on these additional, interpolated examples. By propagating supervision signals across confidence regions and encouraging linear behavior between them, SFTMix mitigates overfitting in confident examples while enhancing generalization in unconfident ones. We demonstrate the effectiveness of SFTMix in both instruction-following and healthcare-specific SFT tasks, with consistent improvements across LLM families and SFT datasets of varying sizes and qualities. Extensive analyses across six directions highlight SFTMix's compatibility with data selection, adaptability to compute-constrained scenarios, and scalability to broader applications.

View on arXiv

@article{xiao2025_2410.05248,
  title={ SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe },
  author={ Yuxin Xiao and Shujian Zhang and Wenxuan Zhou and Marzyeh Ghassemi and Sanqiang Zhao },
  journal={arXiv preprint arXiv:2410.05248},
  year={ 2025 }
}

Comments on this paper