35
0

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Abstract

Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons (mm) needed to arbitrarily change NN labels among KK samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network ff and a neural network gg (with mm neurons) designed for fine-tuning. When gg is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that NN samples can be fine-tuned with m=Θ(N)m=\Theta(N) neurons for 2-layer networks, and with m=Θ(N)m=\Theta(\sqrt{N}) neurons for 3-layer networks, no matter how large KK is. Our results recover the known memorization capacity results when N=KN = K as a special case.

View on arXiv
Comments on this paper