v1v2 (latest)
Beyond Benchmarks: On The False Promise of AI Regulation
- ELM
Main:5 Pages
2 Figures
Bibliography:2 Pages
Abstract
The performance of AI models on safety benchmarks does not indicate their real-world performance after deployment. This opaqueness of AI models impedes existing regulatory frameworks constituted on benchmark performance, leaving them incapable of mitigating ongoing real-world harm. The problem stems from a fundamental challenge in AI interpretability, which seems to be overlooked by regulators and decision makers. We propose a simple, realistic and readily usable regulatory framework which does not rely on benchmarks, and call for interdisciplinary collaboration to find new ways to address this crucial problem.
View on arXivComments on this paper
