ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.10093
18
7

How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective

14 October 2024
Teng Xiao
Mingxiao Li
Yige Yuan
Huaisheng Zhu
Chao Cui
V. Honavar
    ALM
ArXivPDFHTML
Abstract

This paper introduces a novel generalized self-imitation learning (GSIL\textbf{GSIL}GSIL) framework, which effectively and efficiently aligns large language models with offline demonstration data. We develop GSIL\textbf{GSIL}GSIL by deriving a surrogate objective of imitation learning with density ratio estimates, facilitating the use of self-generated data and optimizing the imitation learning objective with simple classification losses. GSIL\textbf{GSIL}GSIL eliminates the need for complex adversarial training in standard imitation learning, achieving lightweight and efficient fine-tuning for large language models. In addition, GSIL\textbf{GSIL}GSIL encompasses a family of offline losses parameterized by a general class of convex functions for density ratio estimation and enables a unified view for alignment with demonstration data. Extensive experiments show that GSIL\textbf{GSIL}GSIL consistently and significantly outperforms baselines in many challenging benchmarks, such as coding (HuamnEval), mathematical reasoning (GSM8K) and instruction-following benchmark (MT-Bench).

View on arXiv
Comments on this paper