Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPS), 2025

22 September 2025

Leszek Sliwko

Jolanta Mizera-Pietraszko

ArXiv (abs)PDF HTML

Main:11 Pages

3 Figures

11 Tables

Abstract

This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining needs. Evaluated on Google Cluster Data, the model achieves over 99% accuracy, reducing computational overhead and improving scheduling latency for constrained tasks. This scalable solution enables real-time optimization, advancing machine learning integration in cluster management and paving the way for future adaptive scheduling strategies.

View on arXiv

Comments on this paper