194

Crowdsourced Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2020
Abstract

Crowdsourcing systems have emerged as an effective platform to label data and classify objects with relatively low cost by exploiting non-expert workers. To ensure reliable recovery of unknown labels with as few number of queries as possible, we consider an effective query type that asks "group attribute" of a chosen subset of objects. In particular, we consider the problem of classifying mm binary labels with XOR queries that ask whether the number of objects having a given attribute in the chosen subset of size dd is even or odd. The subset size dd, which we call query degree, can be varying over queries. Since a worker needs to make more efforts to answer a query of a higher degree, we consider a noise model where the accuracy of worker's answer changes depending both on the worker reliability and query degree dd. For this general model, we characterize the information-theoretic limit on the optimal number of queries to reliably recover mm labels in terms of a given combination of degree-dd queries and noise parameters. Further, we propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.

View on arXiv
Comments on this paper