23
3

Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

Chenyou Fan
Chenjia Bai
Zhao Shan
Haoran He
Yang Zhang
Zhen Wang
Abstract

Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via multi-task imitation, or require task-specific reward labels to facilitate policy optimization via Reinforcement Learning (RL). They are costly due to the substantial human efforts required to collect expert data or design reward functions. To address these challenges, we aim to develop a versatile diffusion planner capable of leveraging large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks. In this paper, we propose SODP, a two-stage framework that leverages Sub-Optimal data to learn a Diffusion Planner, which is generalizable for various downstream tasks. Specifically, in the pre-training stage, we train a foundation diffusion planner that extracts general planning capabilities by modeling the versatile distribution of multi-task trajectories, which can be sub-optimal and has wide data coverage. Then for downstream tasks, we adopt RL-based fine-tuning with task-specific rewards to quickly refine the diffusion planner, which aims to generate action sequences with higher task-specific returns. Experimental results from multi-task domains including Meta-World and Adroit demonstrate that SODP outperforms state-of-the-art methods with only a small amount of data for reward-guided fine-tuning.

View on arXiv
@article{fan2025_2409.19949,
  title={ Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner },
  author={ Chenyou Fan and Chenjia Bai and Zhao Shan and Haoran He and Yang Zhang and Zhen Wang },
  journal={arXiv preprint arXiv:2409.19949},
  year={ 2025 }
}
Comments on this paper