ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.08881
21
34

Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

21 January 2023
Shuaichen Chang
J. Wang
Mingwen Dong
Lin Pan
Henghui Zhu
A. Li
Wuwei Lan
Shenmin Zhang
Jiarong Jiang
Joseph Lilien
Stephen M. Ash
William Yang Wang
Zhiguo Wang
Vittorio Castelli
Patrick K. L. Ng
Bing Xiang
    OOD
ArXivPDFHTML
Abstract

Neural text-to-SQL models have achieved remarkable performance in translating natural language questions into SQL queries. However, recent studies reveal that text-to-SQL models are vulnerable to task-specific perturbations. Previous curated robustness test sets usually focus on individual phenomena. In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. We design 17 perturbations on databases, natural language questions, and SQL queries to measure the robustness from different angles. In order to collect more diversified natural question perturbations, we utilize large pretrained language models (PLMs) to simulate human behaviors in creating natural questions. We conduct a diagnostic study of the state-of-the-art models on the robustness set. Experimental results reveal that even the most robust model suffers from a 14.0% performance drop overall and a 50.7% performance drop on the most challenging perturbation. We also present a breakdown analysis regarding text-to-SQL model designs and provide insights for improving model robustness.

View on arXiv
Comments on this paper