v1v2 (latest)

Beyond Benchmarks: On The False Promise of AI Regulation

26 January 2025

Main:5 Pages

2 Figures

Bibliography:2 Pages

Abstract

The performance of AI models on safety benchmarks does not indicate their real-world performance after deployment. This opaqueness of AI models impedes existing regulatory frameworks constituted on benchmark performance, leaving them incapable of mitigating ongoing real-world harm. The problem stems from a fundamental challenge in AI interpretability, which seems to be overlooked by regulators and decision makers. We propose a simple, realistic and readily usable regulatory framework which does not rely on benchmarks, and call for interdisciplinary collaboration to find new ways to address this crucial problem.

View on arXiv

Comments on this paper