Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models

3 November 2023

Papers citing "Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models"

6 / 6 papers shown

Title
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model Brenden Smith Dallin Baker Clayton Chase Myles Barney Kaden Parker Makenna Allred Peter Hu Alex Evans Nancy Fulda 14 0 0 04 Jul 2024
Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning Yuansheng Xie Soroush Vosoughi Saeed Hassanpour 14 2 0 30 Mar 2022
Framework for Evaluating Faithfulness of Local Explanations S. Dasgupta Nave Frost Michal Moshkovitz FAtt 106 60 0 01 Feb 2022
Interactively Providing Explanations for Transformer Language Models Felix Friedrich P. Schramowski Christopher Tauchmann Kristian Kersting LRM 31 6 0 02 Sep 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 225 3,658 0 28 Feb 2017