Title |
---|
![]() Toward Optimal LLM Alignments Using Two-Player Games Rui Zheng Hongyi Guo Zhihan Liu Xiaoying Zhang Yuanshun Yao ...Tao Gui Qi Zhang Xuanjing Huang Hang Li Yang Liu |
![]() Transfer Q Star: Principled Decoding for LLM Alignment Souradip Chakraborty Soumya Suvra Ghosal Ming Yin Dinesh Manocha Mengdi Wang Amrit Singh Bedi Furong Huang |
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() Online Merging Optimizers for Boosting Rewards and Mitigating Tax in
Alignment Keming Lu Bowen Yu Fei Huang Yang Fan Runji Lin Chang Zhou |