ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.04686
11
4

Thompson sampling for linear quadratic mean-field teams

9 November 2020
Mukul Gagrani
Sagar Sudhakara
Aditya Mahajan
A. Nayyar
Ouyang Yi
ArXivPDFHTML
Abstract

We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of ∣M∣|M|∣M∣ different types at time horizon TTT is O~(∣M∣1.5T)\tilde{\mathcal{O}} \big( |M|^{1.5} \sqrt{T} \big)O~(∣M∣1.5T​) irrespective of the total number of agents, where the O~\tilde{\mathcal{O}}O~ notation hides logarithmic factors in TTT. We present detailed numerical experiments to illustrate the salient features of the proposed algorithm.

View on arXiv
Comments on this paper