ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.19794
22
4

Robust Causal Bandits for Linear Models

30 October 2023
Zirui Yan
Arpan Mukherjee
Burak Varici
A. Tajer
    CML
ArXivPDFHTML
Abstract

Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as T12LT^\frac{1}{2L}T2L1​, where TTT is the time horizon and LLL is the longest causal path in the graph, the existing algorithms will have linear regret in TTT. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with NNN nodes and maximum degree ddd, under a general measure of model deviation CCC, the cumulative regret is upper bounded by O~(dL−12(NT+NC))\tilde{\mathcal{O}}(d^{L-\frac{1}{2}}(\sqrt{NT} + NC))O~(dL−21​(NT​+NC)) and lower bounded by Ω(dL2−2max⁡{T,d2C})\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})Ω(d2L​−2max{T​,d2C}). Comparing these bounds establishes that the proposed algorithm achieves nearly optimal O~(T)\tilde{\mathcal{O}}(\sqrt{T})O~(T​) regret when CCC is o(T)o(\sqrt{T})o(T​) and maintains sub-linear regret for a broader range of CCC.

View on arXiv
Comments on this paper