97

RISE: Rule-Driven SQL Dialect Translation via Query Reduction

Xudong Xie
Yuwei Zhang
Wensheng Dou
Yu Gao
Ziyu Cui
Jiansen Song
Rui Yang
Jun Wei
Main:11 Pages
5 Figures
Bibliography:1 Pages
3 Tables
Abstract

Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation tools rely on manually-crafted rules, necessitating significant manual effort to support new RDBMSs and dialects. Although large language models (LLMs) can assist in translating SQL dialects, they often struggle with lengthy and complex SQL queries.In this paper, we propose RISE, a novel LLM-based SQL dialect translation approach that can accurately handle lengthy and complex SQL queries. Given a complex source query QcQ_c that contains a SQL dialect dd, we first employ a dialect-aware query reduction technique to derive a simplified query QsQ_{s} by removing dd-irrelevant SQL elements from QcQ_c. Subsequently, we utilize LLMs to translate QsQ_{s} into QsQ_{s^{'}}, and automatically extract the translation rule rdr_d for dialect dd based on the relationship between QsQ_{s} and QsQ_{s^{'}}. By applying rdr_d to QcQ_c, we can effectively translate the dialect dd within QcQ_c, thereby bypassing the complexity of the source query QcQ_c. We evaluate RISE on two real-world benchmarks, i.e., TPC-DS and SQLProcBench, comparing its performance against both the traditional rule-based tools and the LLM-based approaches with respect to translation accuracy. RISE achieves accuracies of 97.98% on TPC-DS and 100% on SQLProcBench, outperforming the baselines by an average improvement of 24.62% and 238.41%, respectively.

View on arXiv
Comments on this paper