In this paper, we study the problem of maximizing the difference between an adaptive submodular (revenue) function and an non-negative modular (cost) function under the adaptive setting. The input of our problem is a set of items, where each item has a particular state drawn from some known prior distribution . The revenue function is defined over items and states, and the cost function is defined over items, i.e., each item has a fixed cost. The state of each item is unknown initially, one must select an item in order to observe its realized state. A policy specifies which item to pick next based on the observations made so far. Denote by the expected revenue of and let denote the expected cost of . Our objective is to identify the best policy under a -cardinality constraint. Since our objective function can take on both negative and positive values, the existing results of submodular maximization may not be applicable. To overcome this challenge, we develop a series of effective solutions with performance grantees. Let denote the optimal policy. For the case when is adaptive monotone and adaptive submodular, we develop an effective policy such that , using only value oracle queries. For the case when is adaptive submodular, we present a randomized policy such that .
View on arXiv