MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing

12 September 2025

Rahma Nouaji

ArXiv (abs)PDF HTML Github

Main:13 Pages

12 Figures

Bibliography:3 Pages

4 Tables

Appendix:1 Pages

Abstract

Data loaders are used by Machine Learning (ML) frameworks like PyTorch and TensorFlow to apply transformations to data before feeding it into the accelerator. This operation is called data preprocessing. Data preprocessing plays an important role in the ML training workflow because if it is inefficiently pipelined with the training, it can yield high GPU idleness, resulting in important training delays. Unfortunately, existing data loaders turn out to waste GPU resources, with $76\%$ GPU idleness when using the PyTorch data loader, for example. One key source of inefficiency is the variability in preprocessing time across samples within the same dataset. Existing data loaders are oblivious to this variability, and they construct batches without any consideration of slow or fast samples. In this case, the entire batch is delayed by a single slow sample, stalling the training pipeline and resulting in head-of-line blocking.

View on arXiv

Comments on this paper