Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability

Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability

3 July 2025

Luca Baroni

Galvin Khara

Joachim Schaeffer

Marat Subkhankulov

Stefan Heimersheim

ArXiv (abs)PDF HTML

Papers citing "Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability"

Title
No papers