Title
Toward Preference-aligned Large Language Models via Residual-based Model Steering Lucio La Cava Andrea Tagarelli LLMSV 120 0 0 28 Sep 2025
RepIt: Steering Language Models with Concept-Specific Refusal Vectors Vincent Siu Nathan W. Henry Nicholas Crispino Yang Liu Dawn Song Chenguang Wang LLMSV 193 0 0 16 Sep 2025
Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models Michele Gubian Ioana Krehan Oli Danyi Liu James Kirby Sharon Goldwater SSL 244 0 0 12 Jun 2025
COSMIC: Generalized Refusal Direction Identification in LLM ActivationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Vincent Siu Nicholas Crispino Zihao Yu Sam Pan Yu Yang Yang Liu Dawn Song Chenguang Wang LLMSV 195 5 0 30 May 2025
Linguistic Interpretability of Transformer-based Language Models: a systematic review Miguel López-Otal Jorge Gracia Jordi Bernad Carlos Bobed Lucía Pitarch-Ballesteros Emma Anglés-Herrero VLM 312 5 0 09 Apr 2025
Designing Role Vectors to Improve LLM Inference Behaviour Daniele Potertì Andrea Seveso Fabio Mercorio LLMSV 181 3 0 17 Feb 2025
Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Kristian Kuznetsov Eduard Tulchinskii Laida Kushnareva German Magai Serguei Barannikov Sergey I. Nikolenko Irina Piontkovskaya DeLMO 144 13 0 10 Oct 2024
Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024 Naomi Saphra Sarah Wiegreffe AI4CE 197 31 0 07 Oct 2024
Geometric Signatures of Compositionality Across a Language Model's LifetimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Jin Hwa Lee Thomas Jiralerspong Lei Yu Yoshua Bengio Emily Cheng CoGe 537 8 0 02 Oct 2024
AutoML-guided Fusion of Entity and LLM-based representationsIFIP Working Conference on Database Semantics (IWDS), 2024 Boshko Koloski Senja Pollak Roberto Navigli Blaž Škrlj 142 1 0 19 Aug 2024
Reasoning in Large Language Models: A Geometric Perspective Romain Cosentino Sarath Shekkizhar LRM 180 3 0 02 Jul 2024
Transformer Normalisation Layers and the Independence of Semantic Subspaces S. Menary Samuel Kaski Andre Freitas 167 2 0 25 Jun 2024
Refusal in Language Models Is Mediated by a Single Direction Andy Arditi Oscar Obeso Aaquib Syed Daniel Paleka Nina Panickssery Wes Gurnee Neel Nanda 274 389 0 17 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations Mukhtar Mohamed Oli Danyi Liu Hao Tang Sharon Goldwater SSL 221 8 0 13 Jun 2024
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and GenerationInternational Conference on Machine Learning (ICML), 2023 Randall Balestriero Romain Cosentino Sarath Shekkizhar 274 5 0 04 Dec 2023
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Kevin Liu Stephen Casper Dylan Hadfield-Menell Jacob Andreas HILM 232 50 0 27 Nov 2023
Outlier Dimensions Encode Task-Specific KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 William Rudman Catherine Chen Carsten Eickhoff 211 8 0 26 Oct 2023
Bridging Information-Theoretic and Geometric Compression in Language Models Emily Cheng Corentin Kervadec Marco Baroni 296 24 0 20 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position and context Jiajun Song Yiqiao Zhong 198 12 0 07 Oct 2023
LEACE: Perfect linear concept erasure in closed formNeural Information Processing Systems (NeurIPS), 2023 Nora Belrose David Schneider-Joseph Shauli Ravfogel Robert Bamler Edward Raff Stella Biderman KELM MU 641 163 0 06 Jun 2023
Semantic Composition in Visually Grounded Language Models Rohan Pandey CoGe 169 1 0 15 May 2023
BrainBERT: Self-supervised representation learning for intracranial recordingsInternational Conference on Learning Representations (ICLR), 2023 Christopher Wang Vighnesh Subramaniam A. Yaari Gabriel Kreiman Boris Katz Ignacio Cases Andrei Barbu MedIm SSL 236 59 0 28 Feb 2023
Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence EmbeddingsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023 Rohan Pandey 259 1 0 21 Jan 2023
The Role of Interactive Visualization in Explaining (Large) NLP Models: from Data to Inference R. Brath Daniel A. Keim Johannes Knittel Shimei Pan Pia Sommerauer Hendrik Strobelt 112 14 0 11 Jan 2023
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic TaskInternational Conference on Learning Representations (ICLR), 2022 Kenneth Li Aspen K. Hopkins David Bau Fernanda Viégas Hanspeter Pfister Martin Wattenberg MILM 444 374 0 24 Oct 2022
Reprint: a randomized extrapolation based on principal components for data augmentationSocial Science Research Network (SSRN), 2022 Jiale Wei Qiyuan Chen Pai Peng Benjamin Guedj Le Li 152 2 0 26 Apr 2022
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence ModelsFindings (Findings), 2022 Aaron Mueller Robert Frank Tal Linzen Luheng Wang Sebastian Schuster AIMat 182 36 0 17 Mar 2022
Kernelized Concept ErasureConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Shauli Ravfogel Francisco Vargas Yoav Goldberg Robert Bamler 322 45 0 28 Jan 2022
Linear Adversarial Concept ErasureInternational Conference on Machine Learning (ICML), 2022 Shauli Ravfogel Michael Twiton Yoav Goldberg Robert Bamler KELM 400 78 0 28 Jan 2022
Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with PseudowordsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 Taelin Karidi Yichu Zhou Nathan Schneider Omri Abend Vivek Srikumar 178 15 0 23 Sep 2021
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color Mostafa Abdou Artur Kulmizev Daniel Hershcovich Stella Frank Ellie Pavlick Anders Søgaard 164 154 0 13 Sep 2021

Home
Papers
2105.07109
Cited By

v1v2 (latest)

The Low-Dimensional Linear Geometry of Contextualized Word Representations

Conference on Computational Natural Language Learning (CoNLL), 2021

15 May 2021

Papers citing "The Low-Dimensional Linear Geometry of Contextualized Word Representations"

31 / 31 papers shown

Title
Toward Preference-aligned Large Language Models via Residual-based Model Steering Lucio La Cava Andrea Tagarelli LLMSV 120 0 0 28 Sep 2025
RepIt: Steering Language Models with Concept-Specific Refusal Vectors Vincent Siu Nathan W. Henry Nicholas Crispino Yang Liu Dawn Song Chenguang Wang LLMSV 193 0 0 16 Sep 2025
Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models Michele Gubian Ioana Krehan Oli Danyi Liu James Kirby Sharon Goldwater SSL 244 0 0 12 Jun 2025
COSMIC: Generalized Refusal Direction Identification in LLM ActivationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Vincent Siu Nicholas Crispino Zihao Yu Sam Pan Yu Yang Yang Liu Dawn Song Chenguang Wang LLMSV 195 5 0 30 May 2025
Linguistic Interpretability of Transformer-based Language Models: a systematic review Miguel López-Otal Jorge Gracia Jordi Bernad Carlos Bobed Lucía Pitarch-Ballesteros Emma Anglés-Herrero VLM 312 5 0 09 Apr 2025
Designing Role Vectors to Improve LLM Inference Behaviour Daniele Potertì Andrea Seveso Fabio Mercorio LLMSV 181 3 0 17 Feb 2025
Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Kristian Kuznetsov Eduard Tulchinskii Laida Kushnareva German Magai Serguei Barannikov Sergey I. Nikolenko Irina Piontkovskaya DeLMO 144 13 0 10 Oct 2024
Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024 Naomi Saphra Sarah Wiegreffe AI4CE 197 31 0 07 Oct 2024
Geometric Signatures of Compositionality Across a Language Model's LifetimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Jin Hwa Lee Thomas Jiralerspong Lei Yu Yoshua Bengio Emily Cheng CoGe 537 8 0 02 Oct 2024
AutoML-guided Fusion of Entity and LLM-based representationsIFIP Working Conference on Database Semantics (IWDS), 2024 Boshko Koloski Senja Pollak Roberto Navigli Blaž Škrlj 142 1 0 19 Aug 2024
Reasoning in Large Language Models: A Geometric Perspective Romain Cosentino Sarath Shekkizhar LRM 180 3 0 02 Jul 2024
Transformer Normalisation Layers and the Independence of Semantic Subspaces S. Menary Samuel Kaski Andre Freitas 167 2 0 25 Jun 2024
Refusal in Language Models Is Mediated by a Single Direction Andy Arditi Oscar Obeso Aaquib Syed Daniel Paleka Nina Panickssery Wes Gurnee Neel Nanda 274 389 0 17 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations Mukhtar Mohamed Oli Danyi Liu Hao Tang Sharon Goldwater SSL 221 8 0 13 Jun 2024
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and GenerationInternational Conference on Machine Learning (ICML), 2023 Randall Balestriero Romain Cosentino Sarath Shekkizhar 274 5 0 04 Dec 2023
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Kevin Liu Stephen Casper Dylan Hadfield-Menell Jacob Andreas HILM 232 50 0 27 Nov 2023
Outlier Dimensions Encode Task-Specific KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 William Rudman Catherine Chen Carsten Eickhoff 211 8 0 26 Oct 2023
Bridging Information-Theoretic and Geometric Compression in Language Models Emily Cheng Corentin Kervadec Marco Baroni 296 24 0 20 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position and context Jiajun Song Yiqiao Zhong 198 12 0 07 Oct 2023
LEACE: Perfect linear concept erasure in closed formNeural Information Processing Systems (NeurIPS), 2023 Nora Belrose David Schneider-Joseph Shauli Ravfogel Robert Bamler Edward Raff Stella Biderman KELM MU 641 163 0 06 Jun 2023
Semantic Composition in Visually Grounded Language Models Rohan Pandey CoGe 169 1 0 15 May 2023
BrainBERT: Self-supervised representation learning for intracranial recordingsInternational Conference on Learning Representations (ICLR), 2023 Christopher Wang Vighnesh Subramaniam A. Yaari Gabriel Kreiman Boris Katz Ignacio Cases Andrei Barbu MedIm SSL 236 59 0 28 Feb 2023
Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence EmbeddingsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023 Rohan Pandey 259 1 0 21 Jan 2023
The Role of Interactive Visualization in Explaining (Large) NLP Models: from Data to Inference R. Brath Daniel A. Keim Johannes Knittel Shimei Pan Pia Sommerauer Hendrik Strobelt 112 14 0 11 Jan 2023
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic TaskInternational Conference on Learning Representations (ICLR), 2022 Kenneth Li Aspen K. Hopkins David Bau Fernanda Viégas Hanspeter Pfister Martin Wattenberg MILM 444 374 0 24 Oct 2022
Reprint: a randomized extrapolation based on principal components for data augmentationSocial Science Research Network (SSRN), 2022 Jiale Wei Qiyuan Chen Pai Peng Benjamin Guedj Le Li 152 2 0 26 Apr 2022
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence ModelsFindings (Findings), 2022 Aaron Mueller Robert Frank Tal Linzen Luheng Wang Sebastian Schuster AIMat 182 36 0 17 Mar 2022
Kernelized Concept ErasureConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Shauli Ravfogel Francisco Vargas Yoav Goldberg Robert Bamler 322 45 0 28 Jan 2022
Linear Adversarial Concept ErasureInternational Conference on Machine Learning (ICML), 2022 Shauli Ravfogel Michael Twiton Yoav Goldberg Robert Bamler KELM 400 78 0 28 Jan 2022
Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with PseudowordsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 Taelin Karidi Yichu Zhou Nathan Schneider Omri Abend Vivek Srikumar 178 15 0 23 Sep 2021
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color Mostafa Abdou Artur Kulmizev Daniel Hershcovich Stella Frank Ellie Pavlick Anders Søgaard 164 154 0 13 Sep 2021