Estimating Coverage in Streams via a Modified CVM Method
When individuals in a population can be classified in classes or categories, the coverage of a sample, , is defined as the probability that a randomly selected individual from the population belongs to a class represented in the sample. Estimating coverage is challenging because is not a fixed population parameter, but a property of the sample, and the task becomes more complex when the number of classes is unknown. Furthermore, this problem has not been addressed in scenarios where data arrive as a stream, under the constraint that only elements can be stored at a time. In this paper, we propose a simple and efficient method to estimate in streaming settings, based on a straightforward modification of the CVM algorithm, which is commonly used to estimate the number of distinct elements in a data stream.
View on arXiv