ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.05466
  4. Cited By
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

8 May 2024
Joshua Clymer
Caden Juang
Severin Field
    CVBM
ArXivPDFHTML

Papers citing "Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals"

3 / 3 papers shown
Title
Large Language Models Often Say One Thing and Do Another
Ruoxi Xu
Hongyu Lin
Xianpei Han
Jia Zheng
Weixiang Zhou
Le Sun
Yingfei Sun
37
1
0
10 Mar 2025
Scheming AIs: Will AIs fake alignment during training in order to get
  power?
Scheming AIs: Will AIs fake alignment during training in order to get power?
Joe Carlsmith
53
30
0
14 Nov 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model
  Representations of True/False Datasets
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
1