SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science

30 March 2025

Abstract

Large Language Models (LLMs) have revolutionized automated data analytics and machine learning by enabling dynamic reasoning and adaptability. While recent approaches have advanced multi-stage pipelines through multi-agent systems, they typically rely on rigid, single-path workflows that limit the exploration and integration of diverse strategies, often resulting in suboptimal predictions. To address these challenges, we propose SPIO (Sequential Plan Integration and Optimization), a novel framework that leverages LLM-driven decision-making to orchestrate multi-agent planning across four key modules: data preprocessing, feature engineering, modeling, and hyperparameter tuning. In each module, dedicated planning agents independently generate candidate strategies that cascade into subsequent stages, fostering comprehensive exploration. A plan optimization agent refines these strategies by suggesting several optimized plans. We further introduce two variants: SPIO-S, which selects a single best solution path as determined by the LLM, and SPIO-E, which selects the top k candidate plans and ensembles them to maximize predictive performance. Extensive experiments on Kaggle and OpenML datasets demonstrate that SPIO significantly outperforms state-of-the-art methods, providing a robust and scalable solution for automated data science task.

View on arXiv

@article{seo2025_2503.23314,
  title={ SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science },
  author={ Wonduk Seo and Juhyeon Lee and Yi Bu },
  journal={arXiv preprint arXiv:2503.23314},
  year={ 2025 }
}

Comments on this paper