ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.01811
31
1

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

3 March 2025
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
    AAML
    ELM
ArXivPDFHTML
Abstract

We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. Unlike existing security benchmarks that often serve as proxies for real-world tasks, bench directly measures LLMs' success on tasks regularly performed by machine learning security experts. This approach offers a significant advantage: if a LLM could solve the challenges presented in bench, it would immediately present practical utility for adversarial machine learning researchers. We then design a strong agent that is capable of breaking 75% of CTF-like ("homework exercise") adversarial example defenses. However, we show that this agent is only able to succeed on 13% of the real-world defenses in our benchmark, indicating the large gap between difficulty in attacking "real" code, and CTF-like code. In contrast, a stronger LLM that can attack 21% of real defenses only succeeds on 54% of CTF-like defenses. We make this benchmark available atthis https URL.

View on arXiv
@article{carlini2025_2503.01811,
  title={ AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses },
  author={ Nicholas Carlini and Javier Rando and Edoardo Debenedetti and Milad Nasr and Florian Tramèr },
  journal={arXiv preprint arXiv:2503.01811},
  year={ 2025 }
}
Comments on this paper