Crowdsourced Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

Crowdsourcing systems have emerged as an effective platform to label data and classify objects with relatively low cost by exploiting non-expert workers. To ensure reliable recovery of unknown labels with as few number of queries as possible, we consider an effective query type that asks "group attribute" of a chosen subset of objects. In particular, we consider the problem of classifying binary labels with XOR queries that ask whether the number of objects having a given attribute in the chosen subset of size is even or odd. The subset size , which we call query degree, can be varying over queries. Since a worker needs to make more efforts to answer a query of a higher degree, we consider a noise model where the accuracy of worker's answer changes depending both on the worker reliability and query degree . For this general model, we characterize the information-theoretic limit on the optimal number of queries to reliably recover labels in terms of a given combination of degree- queries and noise parameters. Further, we propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.
View on arXiv