How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

Transactions of the Association for Computational Linguistics (TACL), 2022

7 June 2022

Timothee Mickus

Mathieu Constant

ArXiv (abs)PDF HTML

Papers citing "How to Dissect a Muppet: The Structure of Transformer Embedding Spaces"

17 / 17 papers shown

Title
From Embeddings to Equations: Genetic-Programming Surrogates for Interpretable Transformer Classification M. S. Khorshidi Navid Yazdanjue Hassan Gharoun M. Nikoo Fang Chen Amir H. Gandomi 120 1 0 16 Sep 2025
Iterative Inference in a Chess-Playing Neural Network Elias Sandmann Sebastian Lapuschkin Wojciech Samek 88 1 0 29 Aug 2025
Universal Jailbreak Suffixes Are Strong Attention Hijackers Matan Ben-Tov Mor Geva Mahmood Sharif 152 0 0 15 Jun 2025
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Elena Sofia Ruzzetti Giancarlo A. Xompero Davide Venditti Fabio Massimo Zanzotto KELM PILM 233 1 0 09 Jun 2025
Linguistic Interpretability of Transformer-based Language Models: a systematic review Miguel López-Otal Jorge Gracia Jordi Bernad Carlos Bobed Lucía Pitarch-Ballesteros Emma Anglés-Herrero VLM 324 5 0 09 Apr 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms Xiaojian Li Yongkang Leng Ruiqing Ding Hangjie Mo Shanlin Yang LRM 183 2 0 15 Mar 2025
What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis Peiran Wang Yang Liu Yunfei Lu Jue Hong Ye Wu HILM LRM 230 1 0 20 Feb 2025
Transformer Normalisation Layers and the Independence of Semantic Subspaces S. Menary Samuel Kaski Andre Freitas 183 2 0 25 Jun 2024
Isotropy, Clusters, and ClassifiersAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Timothee Mickus Stig-Arne Gronroos Joseph Attieh 238 0 0 05 Feb 2024
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Aviv Slobodkin Omer Goldman Avi Caciularu Ido Dagan Haiqin Yang HILM LRM 226 44 0 18 Oct 2023
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddingsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023 Timothee Mickus Ananda Sreenidhi 151 3 0 10 Oct 2023
Explaining How Transformers Use Context to Build PredictionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Javier Ferrando Gerard I. Gállego Ioannis Tsiamas Marta R. Costa-jussá 135 50 0 21 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model PredictionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Byung-Doh Oh William Schuler 158 6 0 17 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 637 410 0 28 Apr 2023
Jump to Conclusions: Short-Cutting Transformers With Linear TransformationsInternational Conference on Language Resources and Evaluation (LREC), 2023 Alexander Yom Din Taelin Karidi Leshem Choshen Mor Geva 157 79 0 16 Mar 2023
Understanding Transformer Memorization Recall Through IdiomsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022 Adi Haviv Ido Cohen Jacob Gidron R. Schuster Yoav Goldberg Mor Geva 362 62 0 07 Oct 2022
Analyzing Transformers in Embedding SpaceAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Guy Dar Mor Geva Ankit Gupta Jonathan Berant 297 121 0 06 Sep 2022