Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems

Proceedings of the VLDB Endowment (PVLDB), 2025

1 March 2025

ArXiv (abs)PDF HTML Github (4912★)

Main:5 Pages

1 Figures

Bibliography:3 Pages

2 Tables

Abstract

AI-augmented data processing systems (DPSs) integrate large language models (LLMs) into query pipelines, allowing powerful semantic operations on structured and unstructured data. However, the reliability (a.k.a. trust) of these systems is fundamentally challenged by the potential for LLMs to produce errors, limiting their adoption in critical domains. To help address this reliability bottleneck, we introduce semantic integrity constraints (SICs) -- a declarative abstraction for specifying and enforcing correctness conditions over LLM outputs in semantic queries. SICs generalize traditional database integrity constraints to semantic settings, supporting common types of constraints, such as grounding, soundness, and exclusion, with both proactive and reactive enforcement strategies.

View on arXiv

Comments on this paper