12
0

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

Xingyu Su
Xiner Li
Masatoshi Uehara
Sunwoo Kim
Yulai Zhao
Gabriele Scalia
Ehsan Hajiramezanali
Tommaso Biancalani
Degui Zhi
Shuiwang Ji
Main:10 Pages
5 Figures
Bibliography:4 Pages
12 Tables
Appendix:7 Pages
Abstract

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world applications often demand more than high-fidelity generation, requiring optimization with respect to potentially non-differentiable reward functions such as physics-based simulation or rewards based on scientific knowledge. Although RL methods have been explored to fine-tune diffusion models for such objectives, they often suffer from instability, low sample efficiency, and mode collapse due to their on-policy nature. In this work, we propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions. Our method casts the problem as policy distillation: it collects off-policy data during the roll-in phase, simulates reward-based soft-optimal policies during roll-out, and updates the model by minimizing the KL divergence between the simulated soft-optimal policy and the current model policy. Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods. Empirical results demonstrate the effectiveness and superior reward optimization of our approach across diverse tasks in protein, small molecule, and regulatory DNA design.

View on arXiv
@article{su2025_2507.00445,
  title={ Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design },
  author={ Xingyu Su and Xiner Li and Masatoshi Uehara and Sunwoo Kim and Yulai Zhao and Gabriele Scalia and Ehsan Hajiramezanali and Tommaso Biancalani and Degui Zhi and Shuiwang Ji },
  journal={arXiv preprint arXiv:2507.00445},
  year={ 2025 }
}
Comments on this paper