ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.16378
  4. Cited By
Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update

Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update

AAAI Conference on Artificial Intelligence (AAAI), 2025
24 January 2025
Qing Li
Fauzan Farooqui
Zongxiong Chen
Kun Song
Lei Ma
Fakhri Karray
    KELMLLMSV
ArXiv (abs)PDFHTML

Papers citing "Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update"

11 / 11 papers shown
Title
Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models
Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models
Sihao Wu
Gaojie Jin
Wei Huang
Jianhong Wang
Xiaowei Huang
LLMSV
116
0
0
30 Aug 2025
Learning to Steer: Input-dependent Steering for Multimodal LLMs
Learning to Steer: Input-dependent Steering for Multimodal LLMs
Jayneel Parekh
Pegah Khayatan
Mustafa Shukor
Arnaud Dapogny
A. Newson
Matthieu Cord
LLMSV
356
2
0
18 Aug 2025
A Survey on Training-free Alignment of Large Language Models
A Survey on Training-free Alignment of Large Language Models
Birong Pan
Yongqi Li
Jiasheng Si
Sibo Wei
Mayi Xu
Shen Zhou
Yuanyuan Zhu
Ming Zhong
T. Qian
3DVLM&MA
383
0
0
12 Aug 2025
Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security
Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security
Muzhi Dai
Shixuan Liu
Zhiyuan Zhao
Junyu Gao
Hao Sun
Xuelong Li
AAML
112
4
0
29 Jul 2025
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
Yitong Zhang
Jia Li
L. Cai
Ge Li
VLM
288
3
0
11 Jun 2025
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual ModalitiesVolume 1 (V1), 2025
Fauzan Farooqui
Thy Thy Tran
Preslav Nakov
Iryna Gurevych
MLLMAAML
122
0
0
31 May 2025
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
VSCBench: Bridging the Gap in Vision-Language Model Safety CalibrationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Fauzan Farooqui
Qing Li
Zongxiong Chen
Yuxia Wang
Derui Zhu
Zhuohan Xie
Chenyang Lyu
Xiuying Chen
Preslav Nakov
Fakhri Karray
VLM
165
3
0
26 May 2025
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Li Ji-An
Hua-Dong Xiong
Robert C. Wilson
Marcelo G. Mattar
M. Benna
319
12
0
19 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
250
2
0
04 May 2025
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
Qing Li
Fauzan Farooqui
Derui Zhu
Fengyu Cai
Chenyang Lyu
Fakhri Karray
MU
281
3
0
16 Mar 2025
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
Fauzan Farooqui
Qing Li
Herbert Woisetschlaeger
Zongxiong Chen
Longji Xu
Preslav Nakov
Preslav Nakov
Hans-Arno Jacobsen
Fakhri Karray
MU
300
16
0
22 Feb 2025
1