Beyond Hidden-Layer Manipulation: Semantically-Aware Logit Interventions for Debiasing LLMs

25 October 2025

Wei Xia

ArXiv (abs)PDF HTML

Main:4 Pages

4 Figures

Bibliography:2 Pages

3 Tables

Abstract

We proposed Static and Dynamic -- two zero-shot logits-layer debiasing methods. Dynamic reduces bias by up to 70% with minimal fluency loss. Logits intervention outperforms hidden-layer approaches. We show semantic-aware logits intervention is stable and effective for debiasing aligned LLMs.

View on arXiv

Comments on this paper