Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.14433
Cited By
A Language Model's Guide Through Latent Space
22 February 2024
Dimitri von Rutte
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Language Model's Guide Through Latent Space"
23 / 23 papers shown
Title
Functional Abstraction of Knowledge Recall in Large Language Models
Zijian Wang
Chang Xu
KELM
32
0
0
20 Apr 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Y. Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
32
1
0
20 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
Zijian Wang
Chang Xu
LRM
21
1
0
09 Apr 2025
Representation Bending for Large Language Model Safety
Ashkan Yousefpour
Taeheon Kim
Ryan S. Kwon
Seungbeen Lee
Wonje Jeung
Seungju Han
Alvin Wan
Harrison Ngan
Youngjae Yu
Jonghyun Choi
AAML
ALM
KELM
52
0
0
02 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
53
0
0
01 Apr 2025
Towards LLM Guardrails via Sparse Representation Steering
Zeqing He
Zhibo Wang
Huiyu Xu
Kui Ren
LLMSV
49
1
0
21 Mar 2025
DAPI: Domain Adaptive Toxicity Probe Vector Intervention for Fine-Grained Detoxification
Cho Hyeonsu
Dooyoung Kim
Youngjoong Ko
MoMe
31
0
0
17 Mar 2025
Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering
Xinyu Tang
Xiaolei Wang
Zhihao Lv
Yingqian Min
Wayne Xin Zhao
Binbin Hu
Ziqi Liu
Zhiqiang Zhang
LRM
73
2
0
14 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models
Junsol Kim
James Evans
Aaron Schein
75
2
0
03 Mar 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
78
0
0
27 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
100
0
0
24 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
42
0
0
17 Feb 2025
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
39
3
0
11 Nov 2024
Extracting Unlearned Information from LLMs with Activation Steering
Atakan Seyitoğlu
A. Kuvshinov
Leo Schwinn
Stephan Günnemann
MU
LLMSV
40
3
0
04 Nov 2024
Towards Inference-time Category-wise Safety Steering for Large Language Models
Amrita Bhattacharjee
Shaona Ghosh
Traian Rebedea
Christopher Parisien
LLMSV
23
2
0
02 Oct 2024
Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with Direction-Magnitude Perspective
Van-Cuong Pham
Thien Huu Nguyen
LLMSV
35
3
0
16 Sep 2024
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
Oscar Obeso
Aaquib Syed
Daniel Paleka
Nina Panickssery
Wes Gurnee
Neel Nanda
45
130
0
17 Jun 2024
Specific versus General Principles for Constitutional AI
Sandipan Kundu
Yuntao Bai
Saurav Kadavath
Amanda Askell
Andrew Callahan
...
Zac Hatfield-Dodds
Sören Mindermann
Nicholas Joseph
Sam McCandlish
Jared Kaplan
AILaw
56
24
0
20 Oct 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
209
178
0
20 Oct 2023
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Paul Jacob
Yikang Shen
Gabriele Farina
Jacob Andreas
31
19
0
13 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
216
297
0
26 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
1