Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.04072
Cited By
Among Us: A Sandbox for Agentic Deception
5 April 2025
Satvik Golechha
Adrià Garriga-Alonso
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Among Us: A Sandbox for Agentic Deception"
2 / 2 papers shown
Title
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
82
1
0
01 May 2025
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
70
0
0
25 Apr 2025
1