We study the interplay between feedback and communication in a cooperative online learning setting where a network of agents solves a task in which the learners' feedback is determined by an arbitrary graph. We characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network. Our analysis recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback. A more detailed version of our results also captures the dependence of the regret on the delay caused by the time the information takes to traverse each graph. Experiments run on synthetic data show that the empirical behavior of our algorithm is consistent with the theoretical results.
View on arXiv