Title |
---|
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() A Survey on Self-Evolution of Large Language Models Zhengwei Tao Ting-En Lin Xiancai Chen Hangyu Li Yuchuan Wu Yongbin Li Zhi Jin Fei Huang Dacheng Tao Jingren Zhou |