ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.04250
  4. Cited By

SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs

1 June 2025
Shaona Ghosh
Amrita Bhattacharjee
Yftah Ziser
Christopher Parisien
    LLMSV
ArXiv (abs)PDFHTML

Papers citing "SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs"

3 / 3 papers shown
Title
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Guy Bar-Shalom
Fabrizio Frasca
Yaniv Galron
Yftah Ziser
Haggai Maron
MLLM
0
0
0
30 Sep 2025
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Harethah Shairah
Hasan Hammoud
G. Turkiyyah
Bernard Ghanem
LLMSV
60
1
0
28 Aug 2025
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
Changqing Li
Tianlin Li
Xiaohan Zhang
Aishan Liu
Li Pan
KELMLLMSV
44
0
0
09 Aug 2025
1