Prompt Weight Experiments for LLM Instruction Fine-Tuning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

24 January 2024

Abstract

We present a small study analyzing how prompt token classification loss weighting (PLW) affects the performance of 7B-size LLaMA models fine-tuned on instruction tasks. We recreated Stanford's Alpaca experiment with both LLaMA 1 and LLaMA 2 using multiple instruction datasets. We found that models fine-tuned on our short-completion dataset have a negative quadratic relationship with PLW while models fine-tuned on long-completion datasets were unaffected by PLW.

View on arXiv

Comments on this paper