Probing the Vulnerability of Large Language Models to Polysemantic Interventions

16 May 2025

Papers citing "Probing the Vulnerability of Large Language Models to Polysemantic Interventions"

4 / 4 papers shown

Title
Layers at Similar Depths Generate Similar Activations Across LLM Architectures Christopher Wolfram Aaron Schein 100 2 0 03 Apr 2025
LLM Social Simulations Are a Promising Research Method Jacy Reese Anthis Ryan Liu Sean M. Richardson Austin C. Kozlowski Bernard Koch James A. Evans Erik Brynjolfsson Michael S. Bernstein ALM 111 15 0 03 Apr 2025
Shared Global and Local Geometry of Language Model Embeddings Andrew Lee Melanie Weber F. Viégas Martin Wattenberg FedML 113 7 0 27 Mar 2025
Sparse Autoencoders Can Interpret Randomly Initialized Transformers Thomas Heap Tim Lawson Lucy Farnik Laurence Aitchison 78 17 0 29 Jan 2025