Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity
International Conference on Learning Representations (ICLR), 2024
Main:9 Pages
10 Figures
Bibliography:6 Pages
7 Tables
Appendix:13 Pages
Abstract
Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting and limited output diversity due to its aggressive updates to the data distribution. This paper aim to address these issues by introducing the maximum entropy principle, which favors models with flatter distributions that still effectively capture the data. Specifically, we develop a new distribution matching method called GEM, which solves reverse Kullback-Leibler divergence minimization with an entropy regularizer.
View on arXivComments on this paper
