Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.01732
Cited By
Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models
3 November 2023
Sean Xie
Soroush Vosoughi
Saeed Hassanpour
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models"
6 / 6 papers shown
Title
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model
Brenden Smith
Dallin Baker
Clayton Chase
Myles Barney
Kaden Parker
Makenna Allred
Peter Hu
Alex Evans
Nancy Fulda
14
0
0
04 Jul 2024
Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning
Yuansheng Xie
Soroush Vosoughi
Saeed Hassanpour
14
2
0
30 Mar 2022
Framework for Evaluating Faithfulness of Local Explanations
S. Dasgupta
Nave Frost
Michal Moshkovitz
FAtt
106
60
0
01 Feb 2022
Interactively Providing Explanations for Transformer Language Models
Felix Friedrich
P. Schramowski
Christopher Tauchmann
Kristian Kersting
LRM
31
6
0
02 Sep 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
225
3,658
0
28 Feb 2017
1