Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

2 June 2024

Richard Fang

Papers citing "Teams of LLM Agents can Exploit Zero-Day Vulnerabilities"

27 / 27 papers shown

Title
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design A. Happe Jürgen Cito 19 0 0 14 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape Wenbo Guo Yujin Potter Tianneng Shi Zhun Wang Andy Zhang Dawn Song 31 1 0 07 Apr 2025
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities Yuxuan Zhu Antony Kellermann Dylan Bowman Philip Li Akul Gupta ... Avi Dhir Sudhit Rao Kaicheng Yu Twm Stone Daniel Kang LLMAG ELM 66 1 0 21 Mar 2025
To Patch or Not to Patch: Motivations, Challenges, and Implications for Cybersecurity Jason R. C. Nurse 52 0 0 24 Feb 2025
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing Masaya Kobayashi Masane Fuchi Amar Zanashir Tomonori Yoneda Tomohiro Takagi LLMAG 35 1 0 24 Feb 2025
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements I. Isozaki Manil Shrestha Rick Console Edward Kim ELM 57 3 0 24 Feb 2025
The AI Agent Index Stephen Casper Luke Bailey Rosco Hunter Carson Ezell Emma Cabalé ... Phillip J. K. Christoffersen A. Pinar Ozisik Rakshit Trivedi Dylan Hadfield-Menell Noam Kolt 58 4 0 03 Feb 2025
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects Fred Heiding Simon Lermen Andrew Kao B. Schneier A. Vishwanath 56 5 0 30 Nov 2024
What AI evaluations for preventing catastrophic risks can and cannot do Peter Barnett Lisa Thiergart ELM 64 2 0 26 Nov 2024
Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation Peter Barnett Lisa Thiergart ELM 60 2 0 19 Nov 2024
Safety case template for frontier AI: A cyber inability argument Arthur Goemans Marie Davidsen Buhl Jonas Schuett Tomek Korbak Jessica Wang Benjamin Hilton Geoffrey Irving 45 15 0 12 Nov 2024
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks Dario Pasquini Evgenios M. Kornaropoulos G. Ateniese AAML 17 3 0 28 Oct 2024
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement Antonis Antoniades Albert Örwall Kexun Zhang Yuxi Xie Anirudh Goyal William Yang Wang LLMAG 31 11 0 26 Oct 2024
Voice-Enabled AI Agents can Perform Common Scams Richard Fang Dylan Bowman Daniel Kang 11 0 0 21 Oct 2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents Priyanshu Kumar Elaine Lau Saranya Vijayakumar Tu Trinh Scale Red Team ... Sean Hendryx Shuyan Zhou Matt Fredrikson Summer Yue Zifan Wang LLMAG 21 17 0 11 Oct 2024
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents Simon Lermen Mateusz Dziemian Govind Pimpale LLMAG 15 4 0 08 Oct 2024
The Role of Governments in Increasing Interconnected Post-Deployment Monitoring of AI Merlin Stein Jamie Bernardi Connor Dunlop 13 6 0 07 Oct 2024
AutoPenBench: Benchmarking Generative Agents for Penetration Testing Luca Gioacchini Marco Mellia Idilio Drago Alexander Delsanto G. Siracusano Roberto Bifulco ELM 21 5 0 04 Oct 2024
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares Stav Cohen Ron Bitton Ben Nassi SILM 18 5 0 09 Aug 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits Andis Draguns Andrew Gritsevskiy S. Motwani Charlie Rogers-Smith Jeffrey Ladish Christian Schroeder de Witt 22 2 0 03 Jun 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering John Yang Carlos E. Jimenez Alexander Wettig K. Lieret Shunyu Yao Karthik Narasimhan Ofir Press LLMAG 96 36 0 06 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review Jie Zhang Haoyu Bu Hui Wen Yu Chen Lun Li Hongsong Zhu 24 36 0 06 May 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities Richard Fang R. Bindu Akul Gupta Daniel Kang SILM LLMAG 69 52 0 11 Apr 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents Qiusi Zhan Zhixiang Liang Zifan Ying Daniel Kang LLMAG 42 72 0 05 Mar 2024
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks A. Happe Aaron Kaplan Jürgen Cito 17 13 0 17 Oct 2023
ReAct: Synergizing Reasoning and Acting in Language Models Shunyu Yao Jeffrey Zhao Dian Yu Nan Du Izhak Shafran Karthik Narasimhan Yuan Cao LLMAG ReLM LRM 208 2,413 0 06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022