The Linear Representation Hypothesis and the Geometry of Large Language Models

7 November 2023

Papers citing "The Linear Representation Hypothesis and the Geometry of Large Language Models"

50 / 123 papers shown

Title
On the Geometry of Semantics in Next-token Prediction Yize Zhao Christos Thrampoulidis 11 0 0 13 May 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning Siyi Chen Yimeng Zhang Sijia Liu Q. Qu AAML 85 0 0 30 Apr 2025
Representation Learning on a Random Lattice Aryeh Brill OOD FAtt AI4CE 68 0 0 28 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control Hannah Cyberey David E. Evans LLMSV 74 0 0 23 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model Andrew Lee Lihao Sun Chris Wendler Fernanda Viégas Martin Wattenberg LRM 29 0 0 19 Apr 2025
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research Patrik Reizinger Randall Balestriero David Klindt Wieland Brendel 36 0 0 17 Apr 2025
On Linear Representations and Pretraining Data Frequency in Language Models Jack Merullo Noah A. Smith Sarah Wiegreffe Yanai Elazar 35 0 0 16 Apr 2025
Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation Ji Ma 35 0 0 16 Apr 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries Neil He Jiahong Liu Buze Zhang N. Bui Ali Maatouk Menglin Yang Irwin King Melanie Weber Rex Ying 29 0 0 11 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning Zijian Wang Chang Xu LRM 21 1 0 09 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions Dang Nguyen Chenhao Tan 32 0 0 07 Apr 2025
From Tokens to Lattices: Emergent Lattice Structures in Language Models Bo Xiong Steffen Staab LRM 21 0 0 04 Apr 2025
Language Models Are Implicitly Continuous Samuele Marro Davide Evangelista X. A. Huang Emanuele La Malfa M. Lombardi Michael Wooldridge 26 0 0 04 Apr 2025
LLM Social Simulations Are a Promising Research Method Jacy Reese Anthis Ryan Liu Sean M. Richardson Austin C. Kozlowski Bernard Koch James A. Evans Erik Brynjolfsson Michael S. Bernstein ALM 51 4 0 03 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots Erfan Shayegani G M Shahariar Sara Abdali Lei Yu Nael B. Abu-Ghazaleh Yue Dong AAML 56 0 0 01 Apr 2025
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality Sewoong Lee Adam Davies Marc E. Canby J. Hockenmaier LLMSV 65 0 0 31 Mar 2025
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts Youxiang Zhu Ruochen Li Danqing Wang Daniel Haehn Xiaohui Liang LRM 55 1 0 30 Mar 2025
Shared Global and Local Geometry of Language Model Embeddings Andrew Lee Melanie Weber F. Viégas Martin Wattenberg FedML 74 1 0 27 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations Ziwei Ji L. Yu Yeskendir Koishekenov Yejin Bang Anthony Hartshorn Alan Schelten Cheng Zhang Pascale Fung Nicola Cancedda 46 1 0 18 Mar 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms Xiaojian Li Yongkang Leng Ruiqing Ding Hangjie Mo Shanlin Yang LRM 47 0 0 15 Mar 2025
Combining Causal Models for More Accurate Abstractions of Neural Networks Theodora-Mara Pîslar Sara Magliacane Atticus Geiger AI4CE 50 0 0 14 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? Yuhang Liu Dong Gong Erdun Gao Zhen Zhang Biwei Huang Mingming Gong Anton van den Hengel Javen Qinfeng Shi J. Shi 101 0 0 12 Mar 2025
C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion Lijie Hu Junchi Liao Weimin Lyu Shaopeng Fu Tianhao Huang Shu Yang Guimin Hu Di Wang AAML 65 0 0 12 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models Thomas Winninger Boussad Addad Katarzyna Kapusta AAML 63 0 0 08 Mar 2025
Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting Dominic Maggio Luca Carlone 85 0 0 07 Mar 2025
How can representation dimension dominate structurally pruned LLMs? Mingxue Xu Lisa Alazraki Danilo P. Mandic 56 0 0 06 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models Junsol Kim James Evans Aaron Schein 75 2 0 03 Mar 2025
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning Tianci Liu R. Li Yunzhe Qi Hui Liu X. Tang ... Qingyu Yin Monica Cheng Jun Huan Haoyu Wang Jing Gao KELM 43 2 0 01 Mar 2025
Enhancing Gradient-based Discrete Sampling via Parallel Tempering Luxu Liang Yuhang Jia Feng Zhou 50 0 0 26 Feb 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision Che Liu Yingji Zhang D. Zhang Weijie Zhang Chenggong Gong ... André Freitas Qifan Wang Z. Xu Rongjuncheng Zhang Yong Dai AuLLM 63 0 0 26 Feb 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization Grace Guinan Addison Salvador Michelle A. Smeaton Andrew Glaws Hilary Egan Brian C. Wyatt Babak Anasori K. Fiedler M. Olszta Steven Spurgeon 63 0 0 25 Feb 2025
Is Free Self-Alignment Possible? Dyah Adila Changho Shin Yijing Zhang Frederic Sala MoMe 108 2 0 24 Feb 2025
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence Tom Wollschlager Jannes Elstner Simon Geisler Vincent Cohen-Addad Stephan Günnemann Johannes Gasteiger LLMSV 62 0 0 24 Feb 2025
Activation Steering in Neural Theorem Provers Shashank Kirtania LLMSV 102 0 0 21 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs Xiaohan Zou Jian Kang George Kesidis Lu Lin 123 1 0 18 Feb 2025
LUNAR: LLM Unlearning via Neural Activation Redirection William F. Shen Xinchi Qiu Meghdad Kurmanji Alex Iacob Lorenzo Sani Yihong Chen Nicola Cancedda Nicholas D. Lane MU 51 1 0 11 Feb 2025
Constrained belief updates explain geometric structures in transformer representations Mateusz Piotrowski P. Riechers Daniel Filan A. Shai 74 0 0 04 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Yuchun Miao Sen Zhang Liang Ding Yuqi Zhang L. Zhang Dacheng Tao 81 3 0 31 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment Pegah Khayatan Mustafa Shukor Jayneel Parekh Matthieu Cord LLMSV 38 1 0 06 Jan 2025
Representation in large language models Cameron C. Yetman 41 1 0 03 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 80 4 0 31 Dec 2024
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation Weilong Dong Xinwei Wu Renren Jin Shaoyang Xu Deyi Xiong 54 6 0 31 Dec 2024
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 114 3 0 29 Dec 2024
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study Yang Xu Y. Wang Hao Wang 77 1 0 23 Dec 2024
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models Konstantin Donhauser Kristina Ulicna Gemma Elyse Moran Aditya Ravuri Kian Kenyon-Dean Cian Eastwood Jason Hartford 76 0 0 20 Dec 2024
Does Representation Matter? Exploring Intermediate Layers in Large Language Models Oscar Skean Md Rifat Arefin Yann LeCun Ravid Shwartz-Ziv 79 7 0 12 Dec 2024
A gentle push funziona benissimo: making instructed models in Italian via contrastive activation steering Daniel Scalena Elisabetta Fersini Malvina Nissim LLMSV 70 0 0 27 Nov 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando Oscar Obeso Senthooran Rajamanoharan Neel Nanda 75 10 0 21 Nov 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit Zeqing He Zhibo Wang Zhixuan Chu Huiyu Xu Rui Zheng Kui Ren Chun Chen 52 3 0 17 Nov 2024
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations Laura O'Mahony Nikola S. Nikolov David JP O'Sullivan 28 0 0 15 Nov 2024