RuleFlow : Generating Reusable Program Optimizations with LLMs

6 February 2026

Avaljot Singh

Dushyant Bharadwaj

Stefanos Baziotis

Kaushik Varadharajan

Charith Mendis

ArXiv (abs)PDF HTML Github

Main:8 Pages

11 Figures

Bibliography:2 Pages

2 Tables

Appendix:14 Pages

Abstract

Optimizing Pandas programs is a challenging problem. Existing systems and compiler-based approaches offer reliability but are either heavyweight or support only a limited set of optimizations. Conversely, using LLMs in a per-program optimization methodology can synthesize nontrivial optimizations, but is unreliable, expensive, and offers a low yield. In this work, we introduce a hybrid approach that works in a 3-stage manner that decouples discovery from deployment and connects them via a novel bridge. First, it discovers per-program optimizations (discovery). Second, they are converted into generalised rewrite rules (bridge). Finally, these rules are incorporated into a compiler that can automatically apply them wherever applicable, eliminating repeated reliance on LLMs (deployment). We demonstrate that RuleFlow is the new state-of-the-art (SOTA) Pandas optimization framework on PandasBench, a challenging Pandas benchmark consisting of Python notebooks. Across these notebooks, we achieve a speedup of up to 4.3x over Dias, the previous compiler-based SOTA, and 1914.9x over Modin, the previous systems-based SOTA.Our code is available atthis https URL.

View on arXiv

Comments on this paper