229

Foundation Model's Embedded Representations May Detect Distribution Shift

Abstract

Distribution shifts between train and test datasets obscure our ability to understand the generalization capacity of neural network models. This topic is especially relevant given the success of pre-trained foundation models as starting points for transfer learning (TL) models across tasks and contexts. We present a case study for TL on a pre-trained GPT-2 model onto the Sentiment140 dataset for sentiment classification. We show that Sentiment140's test dataset MM is not sampled from the same distribution as the training dataset PP, and hence training on PP and measuring performance on MM does not actually account for the model's generalization on sentiment classification.

View on arXiv
Comments on this paper