v1v2v3 (latest)

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

20 August 2024

ArXiv (abs)PDF HTML Github

Main:6 Pages

5 Figures

Bibliography:1 Pages

2 Tables

Abstract

Since the release of GPT2-1.5B in 2019, the large language models (LLMs) have evolved from specialized deep models to versatile foundation models. While demonstrating remarkable zero-shot ability, the LLMs still require fine-tuning on local datasets and substantial memory for deployment over the network edges. Traditional first-order fine-tuning techniques require significant GPU memory that exceeds the capacity of mainstream hardware. Besides, the LLMs have been expanded beyond text generation to create images, audio, video, and multi-modal content, necessitating careful investigation of efficient deployment strategies for large-scale foundation models. In response to these challenges, model fine-tuning and model-compression techniques have been developed to support the sustainable growth of LLMs by reducing both operational and capital expenditures. In this work, we provide a comprehensive overview of prevalent memory-efficient fine-tuning methods for deployment at the network edge. We also review state-of-the-art literature on model compression, offering insights into the deployment of LLMs at network edges.

View on arXiv

Comments on this paper