Title |
---|
![]() SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Daoan Zhang Guangchen Lan Dong-Jun Han Wenlin Yao Xiaoman Pan ...Mingxiao Li Pengcheng Chen Yu Dong Christopher Brinton Jiebo Luo |
![]() Unpacking DPO and PPO: Disentangling Best Practices for Learning from
Preference Feedback Hamish Ivison Yizhong Wang Jiacheng Liu Zeqiu Wu Valentina Pyatkin Nathan Lambert Noah A. Smith Yejin Choi Hannaneh Hajishirzi |