Are Your Generated Instances Truly Useful? GenBench-MILP: A Benchmark Suite for MILP Instance Generation
The proliferation of machine learning-based methods for Mixed-Integer Linear Programming (MILP) instance generation has surged, driven by the need for diverse training datasets. However, a critical question remains: Are these generated instances truly useful and realistic? Current evaluation protocols often rely on superficial structural metrics or simple solvability checks, which frequently fail to capture the true computational complexity of real-world problems. To bridge this gap, we introduce GenBench-MILP, a comprehensive benchmark suite designed for the standardized and objective evaluation of MILP generators. Our framework assesses instance quality across four key dimensions: mathematical validity, structural similarity, computational hardness, and utility in downstream tasks. A distinctive innovation of GenBench-MILP is the analysis of solver-internal features -- including root node gaps, heuristic success rates, and cut plane usage. By treating the solver's dynamic behavior as an expert assessment, we reveal nuanced computational discrepancies that static graph features miss. Our experiments on instance generative models demonstrate that instances with high structural similarity scores can still exhibit drastically divergent solver interactions and difficulty levels. By providing this multifaceted evaluation toolkit, GenBench-MILP aims to facilitate rigorous comparisons and guide the development of high-fidelity instance generators.
View on arXiv