87

Estimating Coverage in Streams via a Modified CVM Method

Main:4 Pages
2 Figures
Bibliography:2 Pages
Appendix:1 Pages
Abstract

When individuals in a population can be classified in classes or categories, the coverage of a sample, CC, is defined as the probability that a randomly selected individual from the population belongs to a class represented in the sample. Estimating coverage is challenging because CC is not a fixed population parameter, but a property of the sample, and the task becomes more complex when the number of classes is unknown. Furthermore, this problem has not been addressed in scenarios where data arrive as a stream, under the constraint that only nn elements can be stored at a time. In this paper, we propose a simple and efficient method to estimate CC in streaming settings, based on a straightforward modification of the CVM algorithm, which is commonly used to estimate the number of distinct elements in a data stream.

View on arXiv
Comments on this paper