20
0

NLCTables: A Dataset for Marrying Natural Language Conditions with Table Discovery

Abstract

With the growing abundance of repositories containing tabular data, discovering relevant tables for in-depth analysis remains a challenging task. Existing table discovery methods primarily retrieve desired tables based on a query table or several vague keywords, leaving users to manually filter large result sets. To address this limitation, we propose a new task: NL-conditional table discovery (nlcTD), where users combine a query table with natural language (NL) requirements to refine search results. To advance research in this area, we present nlcTables, a comprehensive benchmark dataset comprising 627 diverse queries spanning NL-only, union, join, and fuzzy conditions, 22,080 candidate tables, and 21,200 relevance annotations. Our evaluation of six state-of-the-art table discovery methods on nlcTables reveals substantial performance gaps, highlighting the need for advanced techniques to tackle this challenging nlcTD scenario. The dataset, construction framework, and baseline implementations are publicly available atthis https URLto foster future research.

View on arXiv
@article{cui2025_2504.15849,
  title={ NLCTables: A Dataset for Marrying Natural Language Conditions with Table Discovery },
  author={ Lingxi Cui and Huan Li and Ke Chen and Lidan Shou and Gang Chen },
  journal={arXiv preprint arXiv:2504.15849},
  year={ 2025 }
}
Comments on this paper