703

R-LoRA: Random Initialization of Multi-Head LoRA for Multi-Task Learning

Main:8 Pages
7 Figures
Bibliography:2 Pages
8 Tables
Appendix:4 Pages
Abstract

Fine-tuning large language models (LLMs) is prohibitively expensive in terms of computational and memory costs. Low-rank Adaptation (LoRA), as one of the most popular parameter-efficient fine-tuning (PEFT) methods, offers a cost-effective alternative by approximating the model changes ΔWRm×n\Delta W \in \mathbb{R}^{m \times n} through the product of down-projection matrix ARm×rA \in \mathbb{R}^{m \times r} and head matrix BRr×nB \in \mathbb{R}^{r \times n}, where rmin(m,n)r \ll \min(m, n). In real-world scenarios, LLMs are fine-tuned on data from multiple domains to perform tasks across various fields, embodying multi-task learning (MTL). LoRA often underperforms in such complex scenarios. To enhance LoRA's capability in multi-task learning, we propose R-LoRA, which incorporates Multi-Head Randomization. Multi-Head Randomization diversifies the head matrices through Multi-Head Random Initialization and Multi-Head Dropout, enabling more efficient learning of task-specific features while maintaining shared knowledge representation. Extensive experiments demonstrate that R-LoRA is better at capturing task-specific knowledge, thereby improving performance in multi-task scenarios. The code is available atthis https URL.

View on arXiv
Comments on this paper