Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

1 July 2025

Xingyu Su

Xiner Li

Masatoshi Uehara

Sunwoo Kim

Yulai Zhao

Gabriele Scalia

Ehsan Hajiramezanali

Tommaso Biancalani

Degui Zhi

Shuiwang Ji

ArXiv (abs)PDF HTML

Main:10 Pages

5 Figures

Bibliography:4 Pages

12 Tables

Appendix:7 Pages

Abstract

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world applications often demand more than high-fidelity generation, requiring optimization with respect to potentially non-differentiable reward functions such as physics-based simulation or rewards based on scientific knowledge. Although RL methods have been explored to fine-tune diffusion models for such objectives, they often suffer from instability, low sample efficiency, and mode collapse due to their on-policy nature. In this work, we propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions. Our method casts the problem as policy distillation: it collects off-policy data during the roll-in phase, simulates reward-based soft-optimal policies during roll-out, and updates the model by minimizing the KL divergence between the simulated soft-optimal policy and the current model policy. Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods. Empirical results demonstrate the effectiveness and superior reward optimization of our approach across diverse tasks in protein, small molecule, and regulatory DNA design.

View on arXiv

@article{su2025_2507.00445,
  title={ Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design },
  author={ Xingyu Su and Xiner Li and Masatoshi Uehara and Sunwoo Kim and Yulai Zhao and Gabriele Scalia and Ehsan Hajiramezanali and Tommaso Biancalani and Degui Zhi and Shuiwang Ji },
  journal={arXiv preprint arXiv:2507.00445},
  year={ 2025 }
}

Comments on this paper