Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2506.04250
Cited By
SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs
1 June 2025
Shaona Ghosh
Amrita Bhattacharjee
Yftah Ziser
Christopher Parisien
LLMSV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs"
3 / 3 papers shown
Title
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Guy Bar-Shalom
Fabrizio Frasca
Yaniv Galron
Yftah Ziser
Haggai Maron
MLLM
0
0
0
30 Sep 2025
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Harethah Shairah
Hasan Hammoud
G. Turkiyyah
Bernard Ghanem
LLMSV
60
1
0
28 Aug 2025
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
Changqing Li
Tianlin Li
Xiaohan Zhang
Aishan Liu
Li Pan
KELM
LLMSV
44
0
0
09 Aug 2025
1