117

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

International Conference on Learning Representations (ICLR), 2024
Main:9 Pages
10 Figures
Bibliography:6 Pages
7 Tables
Appendix:13 Pages
Abstract

Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting and limited output diversity due to its aggressive updates to the data distribution. This paper aim to address these issues by introducing the maximum entropy principle, which favors models with flatter distributions that still effectively capture the data. Specifically, we develop a new distribution matching method called GEM, which solves reverse Kullback-Leibler divergence minimization with an entropy regularizer.

View on arXiv
Comments on this paper