5
0

CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models

Abstract

Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to syntactic discrepancies and subtle semantic variations. Existing approaches including manual rewriting, rule-based systems, and large language model (LLM)-based techniques often involve high maintenance effort (e.g., crafting custom translation rules) or produce unreliable results (e.g., LLM generates non-existent functions), especially when handling complex queries. In this demonstration, we present CrackSQL, the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome these limitations. CrackSQL leverages the adaptability of LLMs to minimize manual intervention, while enhancing translation accuracy by segmenting lengthy complex SQL via functionality-based query processing. To further improve robustness, it incorporates a novel cross-dialect syntax embedding model for precise syntax alignment, as well as an adaptive local-to-global translation strategy that effectively resolves interdependent query operations. CrackSQL supports three translation modes and offers multiple deployment and access options including a web console interface, a PyPI package, and a command-line prompt, facilitating adoption across a variety of real-world use cases

View on arXiv
@article{zhou2025_2504.00882,
  title={ CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models },
  author={ Wei Zhou and Yuyang Gao and Xuanhe Zhou and Guoliang Li },
  journal={arXiv preprint arXiv:2504.00882},
  year={ 2025 }
}
Comments on this paper