Unintentional Unalignment: Likelihood Displacement in Direct Preference OptimizationInternational Conference on Learning Representations (ICLR), 2024 |
A Closer Look at Machine Unlearning for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024 |