Variational Best-of-N Alignment

Variational Best-of-N Alignment

8 July 2024

Ryan Cotterell

Papers citing "Variational Best-of-N Alignment"

16 / 16 papers shown

Title
Soft Best-of-n Sampling for Model Alignment C. M. Verdun Alex Oesterling Himabindu Lakkaraju Flavio du Pin Calmon BDL 50 0 0 06 May 2025
Semantic Probabilistic Control of Language Models Kareem Ahmed Catarina G Belém Padhraic Smyth Sameer Singh 35 0 0 04 May 2025
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo João Loula Benjamin LeBrun Li Du Ben Lipkin Clemente Pasti ... Ryan Cotterel Vikash K. Mansinghka Alexander K. Lew Tim Vieira Timothy J. O'Donnell 29 1 0 17 Apr 2025
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection Souradip Chakraborty Mohammadreza Pourreza Ruoxi Sun Yiwen Song Nino Scherrer ... Furong Huang Amrit Singh Bedi Ahmad Beirami Hamid Palangi Tomas Pfister 46 0 0 02 Apr 2025
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning Yunhao Tang Kunhao Zheng Gabriel Synnaeve Rémi Munos 34 0 0 25 Mar 2025
Faster WIND: Accelerating Iterative Best-of- $N$ Distillation for LLM Alignment Tong Yang Jincheng Mei H. Dai Zixin Wen Shicong Cen Dale Schuurmans Yuejie Chi Bo Dai 36 4 0 20 Feb 2025
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Zhiyuan Zeng Qinyuan Cheng Zhangyue Yin Yunhua Zhou Xipeng Qiu LRM 75 6 0 17 Feb 2025
Fast Best-of-N Decoding via Speculative Rejection Hanshi Sun Momin Haider Ruiqi Zhang Huitao Yang Jiahao Qiu Ming Yin Mengdi Wang Peter L. Bartlett Andrea Zanette BDL 40 26 0 26 Oct 2024
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling Jiahao Qiu Yifu Lu Yifan Zeng Jiacheng Guo Jiayi Geng Huazheng Wang Kaixuan Huang Yue Wu Mengdi Wang 34 22 0 18 Oct 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment Enyu Zhou Guodong Zheng B. Wang Zhiheng Xi Shihan Dou ... Yurong Mou Rui Zheng Tao Gui Qi Zhang Xuanjing Huang ALM 52 13 0 13 Oct 2024
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation Wei Shen Chuheng Zhang OffRL 28 6 0 11 Sep 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates Hui Wei Shenghua He Tian Xia Andy H. Wong Jingyang Lin Mei Han Mei Han ALM ELM 47 22 0 23 Aug 2024
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling Lin Gui Cristina Garbacea Victor Veitch BDL LM&MA 36 35 0 02 Jun 2024
Asymptotics of Language Model Alignment Joy Qiping Yang Salman Salamatian Ziteng Sun A. Suresh Ahmad Beirami 61 21 0 02 Apr 2024
Theoretical guarantees on the best-of-n alignment policy Ahmad Beirami Alekh Agarwal Jonathan Berant Alex DÁmour Jacob Eisenstein Chirag Nagpal A. Suresh 42 42 0 03 Jan 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022