Extracting Paragraphs from LLM Token Activations

10 September 2024

Papers citing "Extracting Paragraphs from LLM Token Activations"

3 / 3 papers shown

Title
Robustly identifying concepts introduced during chat fine-tuning using crosscoders Julian Minder Clement Dumas Caden Juang Bilal Chugtai Neel Nanda 25 0 0 03 Apr 2025
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models Nandan Thakur Nils Reimers Andreas Rucklé Abhishek Srivastava Iryna Gurevych VLM 229 720 0 17 Apr 2021