Faster WIND: Accelerating Iterative Best-of- $N$ Distillation for LLM Alignment

20 February 2025

Papers citing "Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment"

3 / 3 papers shown

Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen Jincheng Mei Katayoon Goshvadi Hanjun Dai Tong Yang Sherry Yang Dale Schuurmans Yuejie Chi Bo Dai OffRL 48 23 0 20 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games Tong Yang Bo Dai Lin Xiao Yuejie Chi OffRL 48 2 0 13 Feb 2025
Theoretical guarantees on the best-of-n alignment policy Ahmad Beirami Alekh Agarwal Jonathan Berant Alex DÁmour Jacob Eisenstein Chirag Nagpal A. Suresh 42 42 0 03 Jan 2024