Foundation Model's Embedded Representations May Detect Distribution
Shift
Abstract
Distribution shifts between train and test datasets obscure our ability to understand the generalization capacity of neural network models. This topic is especially relevant given the success of pre-trained foundation models as starting points for transfer learning (TL) models across tasks and contexts. We present a case study for TL on a pre-trained GPT-2 model onto the Sentiment140 dataset for sentiment classification. We show that Sentiment140's test dataset is not sampled from the same distribution as the training dataset , and hence training on and measuring performance on does not actually account for the model's generalization on sentiment classification.
View on arXivComments on this paper
