Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.12618
Cited By
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
18 June 2024
Marius Mosbach
Vagrant Gautam
Tomás Vergara-Browne
Dietrich Klakow
Mor Geva
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP"
4 / 4 papers shown
Title
Aligned Probing: Relating Toxic Behavior and Model Internals
Andreas Waldis
Vagrant Gautam
Anne Lauscher
Dietrich Klakow
Iryna Gurevych
45
0
0
17 Mar 2025
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Kohei Tsuji
Tatsuya Hiraoka
Yuchang Cheng
Eiji Aramaki
Tomoya Iwakura
74
0
0
27 Feb 2025
What Do Speech Foundation Models Not Learn About Speech?
Abdul Waheed
Hanin Atwany
Bhiksha Raj
Rita Singh
SSL
35
1
0
16 Oct 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
494
0
01 Nov 2022
1