ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.07663
113
12

Causal Markov Decision Processes: Learning Good Interventions Efficiently

15 February 2021
Yangyi Lu
A. Meisami
Ambuj Tewari
ArXiv (abs)PDFHTML
Abstract

We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an O~(HSZT)\tilde{O}(HS\sqrt{ZT})O~(HSZT​) regret bound, where TTT is the the total time steps, HHH is the episodic horizon, and SSS is the cardinality of the state space. Notably, our regret bound does not scale with the size of actions/interventions (AAA), but only scales with a causal graph dependent quantity ZZZ which can be exponentially smaller than AAA. By extending C-UCBVI to the factored MDP setting, we propose the causal factored UCBVI (CF-UCBVI) algorithm, which further reduces the regret exponentially in terms of SSS. Furthermore, we show that RL algorithms for linear MDP problems can also be incorporated in C-MDPs. We empirically show the benefit of our causal approaches in various settings to validate our algorithms and theoretical results.

View on arXiv
Comments on this paper