Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems

22 December 2025

Prathamesh Devadiga

MoE

LRM

ArXiv (abs)PDF HTML

Main:4 Pages

3 Figures

Bibliography:1 Pages

11 Tables

Appendix:4 Pages

Abstract

Traditional auto-parallelizing compilers, reliant on rigid heuristics, struggle with the complexity of modern heterogeneous systems. This paper presents a comprehensive evaluation of small (approximately 1B parameter) language-model-driven compiler auto-parallelization. We evaluate three models: gemma3, llama3.2, and qwen2.5, using six reasoning strategies across 11 real-world kernels drawn from scientific computing, graph algorithms, and machine learning. Our system is benchmarked against strong compiler baselines, including LLVM Polly, TVM, and Triton. Across 376 total evaluations, the proposed approach achieves an average speedup of 6.81x and a peak performance of 43.25x on convolution operations. We analyze scalability, verify correctness using multiple sanitizers, and confirm robustness across diverse compilers and hardware platforms. Our results demonstrate that small, efficient language models can serve as powerful reasoning engines for complex compiler optimization tasks.

View on arXiv

Comments on this paper