406

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Proceedings of the VLDB Endowment (PVLDB), 2024
Aditya G. Parameswaran
Main:17 Pages
8 Figures
Bibliography:2 Pages
8 Tables
Appendix:3 Pages
Abstract

Analyzing unstructured data, such as complex documents, has been a persistent challenge in data processing. Large Language Models (LLMs) have shown promise in this regard, leading to recent proposals for declarative frameworks for LLM-powered unstructured data processing. However, these frameworks focus on reducing cost when executing user-specified operations using LLMs, rather than improving accuracy, executing most operations as-is. This is problematic for complex tasks and data, where LLM outputs for user-defined operations are often inaccurate, even with optimized prompts.

View on arXiv
Comments on this paper