Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.05466
Cited By
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
8 May 2024
Joshua Clymer
Caden Juang
Severin Field
CVBM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals"
3 / 3 papers shown
Title
Large Language Models Often Say One Thing and Do Another
Ruoxi Xu
Hongyu Lin
Xianpei Han
Jia Zheng
Weixiang Zhou
Le Sun
Yingfei Sun
37
1
0
10 Mar 2025
Scheming AIs: Will AIs fake alignment during training in order to get power?
Joe Carlsmith
53
30
0
14 Nov 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
1