ElicitationGPT: Text Elicitation Mechanisms via Language Models
Main:15 Pages
1 Figures
Bibliography:3 Pages
8 Tables
Appendix:17 Pages
Abstract
Scoring rules evaluate probabilistic forecasts of an unknown state against the realized state and are a fundamental building block in the incentivized elicitation of information and the training of machine learning models. This paper develops mechanisms for scoring elicited text against ground truth text using domain-knowledge-free queries to a large language model (specifically ChatGPT) and empirically evaluates their alignment with human preferences. The empirical evaluation is conducted on peer reviews from a peer-grading dataset and in comparison to manual instructor scores for the peer reviews.
View on arXivComments on this paper
