Papers citing 'Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update'

Title
Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models Sihao Wu Gaojie Jin Wei Huang Jianhong Wang Xiaowei Huang LLMSV 96 0 0 30 Aug 2025
Learning to Steer: Input-dependent Steering for Multimodal LLMs Jayneel Parekh Pegah Khayatan Mustafa Shukor Arnaud Dapogny A. Newson Matthieu Cord LLMSV 312 2 0 18 Aug 2025
A Survey on Training-free Alignment of Large Language Models Birong Pan Yongqi Li Jiasheng Si Sibo Wei Mayi Xu Shen Zhou Yuanyuan Zhu Ming Zhong T. Qian 3DV LM&MA 316 0 0 12 Aug 2025
Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security Muzhi Dai Shixuan Liu Zhiyuan Zhao Junyu Gao Hao Sun Xuelong Li AAML 88 4 0 29 Jul 2025
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt Yitong Zhang Jia Li L. Cai Ge Li VLM 228 3 0 11 Jun 2025
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual ModalitiesVolume 1 (V1), 2025 Fauzan Farooqui Thy Thy Tran Preslav Nakov Iryna Gurevych MLLM AAML 106 0 0 31 May 2025
VSCBench: Bridging the Gap in Vision-Language Model Safety CalibrationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Fauzan Farooqui Qing Li Zongxiong Chen Yuxia Wang Derui Zhu Zhuohan Xie Chenyang Lyu Xiuying Chen Preslav Nakov Fakhri Karray VLM 165 3 0 26 May 2025
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations Li Ji-An Hua-Dong Xiong Robert C. Wilson Marcelo G. Mattar M. Benna 283 10 0 19 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models Liqiang Jing Guiming Hardy Chen Ehsan Aghazadeh Xin Eric Wang Xinya Du 230 2 0 04 May 2025
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders Qing Li Fauzan Farooqui Derui Zhu Fengyu Cai Chenyang Lyu Fakhri Karray MU 245 3 0 16 Mar 2025
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models Fauzan Farooqui Qing Li Herbert Woisetschlaeger Zongxiong Chen Longji Xu Preslav Nakov Preslav Nakov Hans-Arno Jacobsen Fakhri Karray MU 279 14 0 22 Feb 2025