ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.03658
  4. Cited By
The Linear Representation Hypothesis and the Geometry of Large Language
  Models

The Linear Representation Hypothesis and the Geometry of Large Language Models

7 November 2023
Kiho Park
Yo Joong Choe
Victor Veitch
    LLMSV
    MILM
ArXivPDFHTML

Papers citing "The Linear Representation Hypothesis and the Geometry of Large Language Models"

50 / 123 papers shown
Title
On the Geometry of Semantics in Next-token Prediction
On the Geometry of Semantics in Next-token Prediction
Yize Zhao
Christos Thrampoulidis
11
0
0
13 May 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
Siyi Chen
Yimeng Zhang
Sijia Liu
Q. Qu
AAML
85
0
0
30 Apr 2025
Representation Learning on a Random Lattice
Representation Learning on a Random Lattice
Aryeh Brill
OOD
FAtt
AI4CE
68
0
0
28 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David E. Evans
LLMSV
74
0
0
23 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
29
0
0
19 Apr 2025
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
Patrik Reizinger
Randall Balestriero
David Klindt
Wieland Brendel
36
0
0
17 Apr 2025
On Linear Representations and Pretraining Data Frequency in Language Models
On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo
Noah A. Smith
Sarah Wiegreffe
Yanai Elazar
35
0
0
16 Apr 2025
Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation
Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation
Ji Ma
35
0
0
16 Apr 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Neil He
Jiahong Liu
Buze Zhang
N. Bui
Ali Maatouk
Menglin Yang
Irwin King
Melanie Weber
Rex Ying
29
0
0
11 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
Zijian Wang
Chang Xu
LRM
21
1
0
09 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen
Chenhao Tan
32
0
0
07 Apr 2025
From Tokens to Lattices: Emergent Lattice Structures in Language Models
From Tokens to Lattices: Emergent Lattice Structures in Language Models
Bo Xiong
Steffen Staab
LRM
21
0
0
04 Apr 2025
Language Models Are Implicitly Continuous
Language Models Are Implicitly Continuous
Samuele Marro
Davide Evangelista
X. A. Huang
Emanuele La Malfa
M. Lombardi
Michael Wooldridge
26
0
0
04 Apr 2025
LLM Social Simulations Are a Promising Research Method
LLM Social Simulations Are a Promising Research Method
Jacy Reese Anthis
Ryan Liu
Sean M. Richardson
Austin C. Kozlowski
Bernard Koch
James A. Evans
Erik Brynjolfsson
Michael S. Bernstein
ALM
51
4
0
03 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
56
0
0
01 Apr 2025
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
Sewoong Lee
Adam Davies
Marc E. Canby
J. Hockenmaier
LLMSV
65
0
0
31 Mar 2025
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts
Youxiang Zhu
Ruochen Li
Danqing Wang
Daniel Haehn
Xiaohui Liang
LRM
55
1
0
30 Mar 2025
Shared Global and Local Geometry of Language Model Embeddings
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee
Melanie Weber
F. Viégas
Martin Wattenberg
FedML
74
1
0
27 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
46
1
0
18 Mar 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Xiaojian Li
Yongkang Leng
Ruiqing Ding
Hangjie Mo
Shanlin Yang
LRM
47
0
0
15 Mar 2025
Combining Causal Models for More Accurate Abstractions of Neural Networks
Theodora-Mara Pîslar
Sara Magliacane
Atticus Geiger
AI4CE
50
0
0
14 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu
Dong Gong
Erdun Gao
Zhen Zhang
Biwei Huang
Mingming Gong
Anton van den Hengel
Javen Qinfeng Shi
J. Shi
101
0
0
12 Mar 2025
C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion
Lijie Hu
Junchi Liao
Weimin Lyu
Shaopeng Fu
Tianhao Huang
Shu Yang
Guimin Hu
Di Wang
AAML
65
0
0
12 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
63
0
0
08 Mar 2025
Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting
Dominic Maggio
Luca Carlone
85
0
0
07 Mar 2025
How can representation dimension dominate structurally pruned LLMs?
Mingxue Xu
Lisa Alazraki
Danilo P. Mandic
56
0
0
06 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models
Linear Representations of Political Perspective Emerge in Large Language Models
Junsol Kim
James Evans
Aaron Schein
75
2
0
03 Mar 2025
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu
R. Li
Yunzhe Qi
Hui Liu
X. Tang
...
Qingyu Yin
Monica Cheng
Jun Huan
Haoyu Wang
Jing Gao
KELM
43
2
0
01 Mar 2025
Enhancing Gradient-based Discrete Sampling via Parallel Tempering
Enhancing Gradient-based Discrete Sampling via Parallel Tempering
Luxu Liang
Yuhang Jia
Feng Zhou
50
0
0
26 Feb 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
63
0
0
26 Feb 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Grace Guinan
Addison Salvador
Michelle A. Smeaton
Andrew Glaws
Hilary Egan
Brian C. Wyatt
Babak Anasori
K. Fiedler
M. Olszta
Steven Spurgeon
63
0
0
25 Feb 2025
Is Free Self-Alignment Possible?
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
108
2
0
24 Feb 2025
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschlager
Jannes Elstner
Simon Geisler
Vincent Cohen-Addad
Stephan Günnemann
Johannes Gasteiger
LLMSV
62
0
0
24 Feb 2025
Activation Steering in Neural Theorem Provers
Activation Steering in Neural Theorem Provers
Shashank Kirtania
LLMSV
102
0
0
21 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
123
1
0
18 Feb 2025
LUNAR: LLM Unlearning via Neural Activation Redirection
LUNAR: LLM Unlearning via Neural Activation Redirection
William F. Shen
Xinchi Qiu
Meghdad Kurmanji
Alex Iacob
Lorenzo Sani
Yihong Chen
Nicola Cancedda
Nicholas D. Lane
MU
51
1
0
11 Feb 2025
Constrained belief updates explain geometric structures in transformer representations
Constrained belief updates explain geometric structures in transformer representations
Mateusz Piotrowski
P. Riechers
Daniel Filan
A. Shai
74
0
0
04 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
L. Zhang
Dacheng Tao
81
3
0
31 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment
Pegah Khayatan
Mustafa Shukor
Jayneel Parekh
Matthieu Cord
LLMSV
38
1
0
06 Jan 2025
Representation in large language models
Cameron C. Yetman
41
1
0
03 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
80
4
0
31 Dec 2024
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
54
6
0
31 Dec 2024
ICLR: In-Context Learning of Representations
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
114
3
0
29 Dec 2024
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu
Y. Wang
Hao Wang
77
1
0
23 Dec 2024
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Konstantin Donhauser
Kristina Ulicna
Gemma Elyse Moran
Aditya Ravuri
Kian Kenyon-Dean
Cian Eastwood
Jason Hartford
76
0
0
20 Dec 2024
Does Representation Matter? Exploring Intermediate Layers in Large
  Language Models
Does Representation Matter? Exploring Intermediate Layers in Large Language Models
Oscar Skean
Md Rifat Arefin
Yann LeCun
Ravid Shwartz-Ziv
79
7
0
12 Dec 2024
A gentle push funziona benissimo: making instructed models in Italian
  via contrastive activation steering
A gentle push funziona benissimo: making instructed models in Italian via contrastive activation steering
Daniel Scalena
Elisabetta Fersini
Malvina Nissim
LLMSV
70
0
0
27 Nov 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
Oscar Obeso
Senthooran Rajamanoharan
Neel Nanda
75
10
0
21 Nov 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
52
3
0
17 Nov 2024
Towards Utilising a Range of Neural Activations for Comprehending
  Representational Associations
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations
Laura O'Mahony
Nikola S. Nikolov
David JP O'Sullivan
28
0
0
15 Nov 2024
123
Next