Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.14499
Cited By
Measuring AI Ability to Complete Long Tasks
18 March 2025
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
Max Hasin
Sami Jawhar
Megan Kinniment
Nate Rush
Sydney Von Arx
Ryan Bloom
Thomas Broadley
Haoxing Du
Brian Goodrich
Nikola Jurkovic
Luke Harold Miles
Seraphina Nix
Tao R. Lin
Neev Parikh
David Rein
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring AI Ability to Complete Long Tasks"
4 / 4 papers shown
Title
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Lennart Luettgau
Harry Coppock
Magda Dubois
Christopher Summerfield
Cozmin Ududec
21
0
0
08 May 2025
Characterizing AI Agents for Alignment and Governance
Atoosa Kasirzadeh
Iason Gabriel
47
0
0
30 Apr 2025
Unraveling Human-AI Teaming: A Review and Outlook
Bowen Lou
Tian Lu
T. S. Raghu
Yingjie Zhang
26
0
0
08 Apr 2025
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak
Mikita Balesni
Buck Shlegeris
Geoffrey Irving
ELM
27
1
0
07 Apr 2025
1