Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

12 June 2024

Feiran Huang

Xiao Huang

Abstract

Generating accurate SQL from users' natural language questions (text-to-SQL) remains a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Traditional text-to-SQL systems, which combine human engineering and deep neural networks, have made significant progress. Subsequently, pre-trained language models (PLMs) have been developed for text-to-SQL tasks, achieving promising results. However, as modern databases and user questions grow more complex, PLMs with a limited parameter size often produce incorrect SQL. This necessitates more sophisticated and tailored optimization methods, which restricts the application of PLM-based systems. Recently, large language models (LLMs) have shown significant capabilities in natural language understanding as model scale increases. Thus, integrating LLM-based solutions can bring unique opportunities, improvements, and solutions to text-to-SQL research. In this survey, we provide a comprehensive review of existing LLM-based text-to-SQL studies. Specifically, we offer a brief overview of the technical challenges and evolutionary process of text-to-SQL. Next, we introduce the datasets and metrics designed to evaluate text-to-SQL systems. Subsequently, we present a systematic analysis of recent advances in LLM-based text-to-SQL. Finally, we make a summarization and discuss the remaining challenges in this field and suggest expectations for future research directions.

View on arXiv

@article{hong2025_2406.08426,
  title={ Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL },
  author={ Zijin Hong and Zheng Yuan and Qinggang Zhang and Hao Chen and Junnan Dong and Feiran Huang and Xiao Huang },
  journal={arXiv preprint arXiv:2406.08426},
  year={ 2025 }
}

Comments on this paper