Title |
---|
![]() I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative
Self-Enhancement Paradigm Yiming Liang Ge Zhang Xingwei Qu Tianyu Zheng Jiawei Guo ...Jiaheng Liu Chenghua Lin Lei Ma Wenhao Huang Jiajun Zhang |
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() Online Merging Optimizers for Boosting Rewards and Mitigating Tax in
Alignment Keming Lu Bowen Yu Fei Huang Yang Fan Runji Lin Chang Zhou |