Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games

13 February 2025

Papers citing "Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games"

2 / 2 papers shown

Title
Faster WIND: Accelerating Iterative Best-of- $N$ Distillation for LLM Alignment Tong Yang Jincheng Mei H. Dai Zixin Wen Shicong Cen Dale Schuurmans Yuejie Chi Bo Dai 36 4 0 20 Feb 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen Jincheng Mei Katayoon Goshvadi Hanjun Dai Tong Yang Sherry Yang Dale Schuurmans Yuejie Chi Bo Dai OffRL 60 23 0 20 Feb 2025