To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models

15 October 2025

Papers citing "To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models"

2 / 2 papers shown

Title
In-Distribution Steering: Balancing Control and Coherence in Language Model Generation Arthur Vogels Benjamin Wong Yann Choho A. Blangero Milan Bhan LLMSV 80 0 0 15 Oct 2025
Did I Faithfully Say What I Thought? Bridging the Gap Between Neural Activity and Self-Explanations in Large Language Models Milan Bhan Jean-Noel Vittaut Nicolas Chesneau Sarath Chandar Marie-Jeanne Lesot LRM 161 0 0 10 Jun 2025