363
v1v2 (latest)

Beyond Benchmarks: On The False Promise of AI Regulation

Main:5 Pages
2 Figures
Bibliography:2 Pages
Abstract

The performance of AI models on safety benchmarks does not indicate their real-world performance after deployment. This opaqueness of AI models impedes existing regulatory frameworks constituted on benchmark performance, leaving them incapable of mitigating ongoing real-world harm. The problem stems from a fundamental challenge in AI interpretability, which seems to be overlooked by regulators and decision makers. We propose a simple, realistic and readily usable regulatory framework which does not rely on benchmarks, and call for interdisciplinary collaboration to find new ways to address this crucial problem.

View on arXiv
Comments on this paper