Among Us: A Sandbox for Agentic Deception

5 April 2025

Papers citing "Among Us: A Sandbox for Agentic Deception"

2 / 2 papers shown

Title
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i Kola Ayonrinde Louis Jaburi MILM 82 1 0 01 May 2025
Scaling Laws For Scalable Oversight Joshua Engels David D. Baek Subhash Kantamneni Max Tegmark ELM 70 0 0 25 Apr 2025