Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives

23 May 2025

Papers citing "Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives"

7 / 7 papers shown

Title
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs Zeming Wei Chengcan Wu Meng Sun 59 0 0 02 Jun 2025
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval Taiye Chen Zeming Wei Ang Li Yisen Wang AAML 71 2 0 21 May 2025
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models Pin-Yu Chen Han Shen Payel Das Tianyi Chen 97 4 0 24 Mar 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates Kaifeng Lyu Haoyu Zhao Xinran Gu Dingli Yu Anirudh Goyal Sanjeev Arora ALM 133 59 0 20 Jan 2025
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation Mingjie Li Wai Man Si Michael Backes Yang Zhang Yisen Wang 131 19 0 03 Jan 2025
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Maksym Andriushchenko Francesco Croce Nicolas Flammarion AAML 206 222 0 02 Apr 2024
The Unreasonable Ineffectiveness of the Deeper Layers Andrey Gromov Kushal Tirumala Hassan Shapourian Paolo Glorioso Daniel A. Roberts 158 106 0 26 Mar 2024