Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.08379
Cited By
Scheming AIs: Will AIs fake alignment during training in order to get power?
14 November 2023
Joe Carlsmith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scheming AIs: Will AIs fake alignment during training in order to get power?"
1 / 1 papers shown
Title
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
24
22
0
11 Jun 2024
1