ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01384
42
2

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

3 February 2025
Oussama Zekri
Nicolas Boullé
    DiffM
ArXivPDFHTML
Abstract

Discrete diffusion models have recently gained significant attention due to their ability to process complex discrete structures for language modeling. However, fine-tuning these models with policy gradient methods, as is commonly done in Reinforcement Learning from Human Feedback (RLHF), remains a challenging task. We propose an efficient, broadly applicable, and theoretically justified policy gradient algorithm, called Score Entropy Policy Optimization (SEPO), for fine-tuning discrete diffusion models over non-differentiable rewards. Our numerical experiments across several discrete generative tasks demonstrate the scalability and efficiency of our method. Our code is available atthis https URL

View on arXiv
@article{zekri2025_2502.01384,
  title={ Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods },
  author={ Oussama Zekri and Nicolas Boullé },
  journal={arXiv preprint arXiv:2502.01384},
  year={ 2025 }
}
Comments on this paper