37
3

Crowdsourced Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

Abstract

Crowdsourcing systems have emerged as an effective platform to label data and classify objects with relatively low cost by exploiting non-expert workers. To ensure reliable recovery of unknown labels with as few number of queries as possible, we consider an effective query type that asks "group attribute" of a chosen subset of objects. In particular, we consider the problem of classifying mm binary labels with XOR queries that ask whether the number of objects having a given attribute in the chosen subset of size dd is even or odd. The subset size dd, which we call query degree, can be varying over queries. Since a worker needs to make more efforts to answer a query of a higher degree, we consider a noise model where the accuracy of worker's answer changes depending both on the worker reliability and query degree dd. For this general model, we characterize the information-theoretic limit on the optimal number of queries to reliably recover mm labels in terms of a given combination of degree-dd queries and noise parameters. Further, we propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.

View on arXiv
Comments on this paper