v1v2v3v4 (latest)

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

25 February 2025

Chengpiao Huang

Yuhang Wu

Kaizheng Wang

ArXiv (abs)PDF HTML Github

Main:56 Pages

13 Figures

Bibliography:5 Pages

7 Tables

Abstract

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, addressing the distribution shift between the simulated and real populations. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield excessively loose estimates. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous fidelity gaps across different LLMs and domains.

View on arXiv

Comments on this paper