Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks

International Conference on Signal Processing and Communications (ICSPC), 2023

24 October 2023

Hitesh Gudwani

ArXiv (abs)PDF HTML Github

Main:8 Pages

6 Figures

Bibliography:2 Pages

2 Tables

Appendix:11 Pages

Abstract

We consider a system of multiple sources, a single communication channel, and a single monitoring station. Each source measures a time-varying quantity with varying levels of accuracy and one of them sends its update to the monitoring station via the channel. The probability of success of each attempted communication is a function of the source scheduled for transmitting its update. Both the probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler. The metric of interest is the reward received by the system which depends on the accuracy of the last update received by the destination and the Age-of-Information (AoI) of the system. We model our scheduling problem as a variant of the multi-arm bandit problem with sources as different arms. We compare the performance of all $4$ standard bandit policies, namely, ETC, $\epsilon$ -greedy, UCB, and TS suitably adjusted to our system model via simulations. In addition, we provide analytical guarantees of $2$ of these policies, ETC, and $\epsilon$ -greedy. Finally, we characterize the lower bound on the cumulative regret achievable by any policy.

View on arXiv

Comments on this paper