Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and
for Small Data
For natural language processing 'text-to-text' tasks, the prevailing approaches heavily rely on pretraining large self-supervised models on massive external data sources, which incurs exceptional pretraining data requirements and a diminished ability to pretrain over small datasets. However, fundamental pretraining method capabilities like few to zero-shot learning or preserving minority concept (long-tail) prediction performance along with accordingly designed evaluation scenarios remain open challenges. We thus propose Contrastive Label-Embedding Self-Supervision (CLESS) pretraining, which enables pretraining from multiple magnitudes smaller, 'task internal' data only, while still strongly improving fully supervised, long-tail, few-shot and self-supervised zero-shot learning abilities. Accordingly, we analyse improvements in learning dynamics over baselines on a challenging long-tailed, low-resource, multi-label text classification scenario with noisy, highly sparse labels and many minority concepts. We find that long-tailed zero and few-shot learning markedly benefit from increasing 'dataset-internal' self-supervised pretraining signals, to help reduce the reliance on large external sources.
View on arXiv