ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.10487
34
0

Fast Proxies for LLM Robustness Evaluation

14 February 2025
Tim Beyer
Jan Schuchardt
Leo Schwinn
Stephan Günnemann
    AAML
ArXivPDFHTML
Abstract

Evaluating the robustness of LLMs to adversarial attacks is crucial for safe deployment, yet current red-teaming methods are often prohibitively expensive. We compare the ability of fast proxy metrics to predict the real-world robustness of an LLM against a simulated attacker ensemble. This allows us to estimate a model's robustness to computationally expensive attacks without requiring runs of the attacks themselves. Specifically, we consider gradient-descent-based embedding-space attacks, prefilling attacks, and direct prompting. Even though direct prompting in particular does not achieve high ASR, we find that it and embedding-space attacks can predict attack success rates well, achieving rp=0.87r_p=0.87rp​=0.87 (linear) and rs=0.94r_s=0.94rs​=0.94 (Spearman rank) correlations with the full attack ensemble while reducing computational cost by three orders of magnitude.

View on arXiv
@article{beyer2025_2502.10487,
  title={ Fast Proxies for LLM Robustness Evaluation },
  author={ Tim Beyer and Jan Schuchardt and Leo Schwinn and Stephan Günnemann },
  journal={arXiv preprint arXiv:2502.10487},
  year={ 2025 }
}
Comments on this paper