From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP

18 June 2024

Papers citing "From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP"

4 / 4 papers shown

Title
Aligned Probing: Relating Toxic Behavior and Model Internals Andreas Waldis Vagrant Gautam Anne Lauscher Dietrich Klakow Iryna Gurevych 45 0 0 17 Mar 2025
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors Kohei Tsuji Tatsuya Hiraoka Yuchang Cheng Eiji Aramaki Tomoya Iwakura 74 0 0 27 Feb 2025
What Do Speech Foundation Models Not Learn About Speech? Abdul Waheed Hanin Atwany Bhiksha Raj Rita Singh SSL 35 1 0 16 Oct 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 494 0 01 Nov 2022