v1v2 (latest)

Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems

24 July 2018

Balaji Lakshminarayanan

Prav Srinivasan

AI4TS

ArXiv (abs)PDF HTML

Papers citing "Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems"

8 / 8 papers shown

Multi-Agent Reinforcement Learning for Long-Term Network Resource Allocation through Auction: a V2X ApplicationComputer Communications (Comput. Commun.), 2022

182

29 Jul 2022

Learning to Bid Long-Term: Multi-Agent Reinforcement Learning with Long-Term and Sparse Reward in Repeated Auction Games

Jing Tan

R. Khalili

Holger Karl

114

05 Apr 2022

A Conceptual Framework for Externally-influenced Agents: An Assisted Reinforcement Learning Review

405

03 Jul 2020

Stochastic bandits with arm-dependent delays

279

18 Jun 2020

An empirical investigation of the challenges of real-world reinforcement learning

464

130

24 Mar 2020

Nonstochastic Multiarmed Bandits with Unrestricted DelaysNeural Information Processing Systems (NeurIPS), 2019

Tobias Sommer Thune

Nicolò Cesa-Bianchi

Yevgeny Seldin

395

03 Jun 2019

Challenges of Real-World Reinforcement Learning

431

640

29 Apr 2019

Linear Bandits with Stochastic Delayed FeedbackInternational Conference on Machine Learning (ICML), 2018

409

05 Jul 2018