Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

9 October 2024

Jiancheng Liu

Sijia Liu

Papers citing "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"

9 / 9 papers shown

Title
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation Stefan Vasilev Christian Herold Baohao Liao Seyyed Hadi Hashemi Shahram Khadivi Christof Monz MU 35 0 0 09 May 2025
A mean teacher algorithm for unlearning of language models Yegor Klochkov MU 58 0 0 18 Apr 2025
$SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs$ SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs Aashiq Muhamed Jacopo Bonato Mona Diab Virginia Smith MU 37 0 0 11 Apr 2025
Bridging the Gap Between Preference Alignment and Machine Unlearning Xiaohua Feng Yuyuan Li Huwei Ji Jiaming Zhang L. Zhang Tianyu Du Chaochao Chen MU 35 0 0 09 Apr 2025
Understanding Machine Unlearning Through the Lens of Mode Connectivity Jiali Cheng Hadi Amiri MU 30 0 0 08 Apr 2025
Not All Data Are Unlearned Equally Aravind Krishnan Siva Reddy Marius Mosbach MU 36 0 0 07 Apr 2025
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning Yiwei Chen Yuguang Yao Yihua Zhang Bingquan Shen Gaowen Liu Sijia Liu AAML MU 52 1 0 14 Mar 2025
Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models Huazheng Wang Yongcheng Jing Haifeng Sun Yingjie Wang J. Wang Jianxin Liao Dacheng Tao KELM MU 42 0 0 27 Feb 2025
A General Framework to Enhance Fine-tuning-based LLM Unlearning J. Ren Zhenwei Dai X. Tang Hui Liu Jingying Zeng ... R. Goutam Suhang Wang Yue Xing Qi He Hui Liu MU 91 1 0 25 Feb 2025