ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.01637
  4. Cited By
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

2 June 2024
Richard Fang
Antony Kellermann
Akul Gupta
Qiusi Zhan
Richard Fang
R. Bindu
Daniel Kang
    LLMAG
ArXivPDFHTML

Papers citing "Teams of LLM Agents can Exploit Zero-Day Vulnerabilities"

27 / 27 papers shown
Title
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
A. Happe
Jürgen Cito
19
0
0
14 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
31
1
0
07 Apr 2025
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Yuxuan Zhu
Antony Kellermann
Dylan Bowman
Philip Li
Akul Gupta
...
Avi Dhir
Sudhit Rao
Kaicheng Yu
Twm Stone
Daniel Kang
LLMAG
ELM
66
1
0
21 Mar 2025
To Patch or Not to Patch: Motivations, Challenges, and Implications for Cybersecurity
To Patch or Not to Patch: Motivations, Challenges, and Implications for Cybersecurity
Jason R. C. Nurse
52
0
0
24 Feb 2025
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing
Masaya Kobayashi
Masane Fuchi
Amar Zanashir
Tomonori Yoneda
Tomohiro Takagi
LLMAG
35
1
0
24 Feb 2025
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
I. Isozaki
Manil Shrestha
Rick Console
Edward Kim
ELM
57
3
0
24 Feb 2025
The AI Agent Index
The AI Agent Index
Stephen Casper
Luke Bailey
Rosco Hunter
Carson Ezell
Emma Cabalé
...
Phillip J. K. Christoffersen
A. Pinar Ozisik
Rakshit Trivedi
Dylan Hadfield-Menell
Noam Kolt
58
4
0
03 Feb 2025
Evaluating Large Language Models' Capability to Launch Fully Automated
  Spear Phishing Campaigns: Validated on Human Subjects
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects
Fred Heiding
Simon Lermen
Andrew Kao
B. Schneier
A. Vishwanath
56
5
0
30 Nov 2024
What AI evaluations for preventing catastrophic risks can and cannot do
What AI evaluations for preventing catastrophic risks can and cannot do
Peter Barnett
Lisa Thiergart
ELM
64
2
0
26 Nov 2024
Declare and Justify: Explicit assumptions in AI evaluations are
  necessary for effective regulation
Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation
Peter Barnett
Lisa Thiergart
ELM
60
2
0
19 Nov 2024
Safety case template for frontier AI: A cyber inability argument
Safety case template for frontier AI: A cyber inability argument
Arthur Goemans
Marie Davidsen Buhl
Jonas Schuett
Tomek Korbak
Jessica Wang
Benjamin Hilton
Geoffrey Irving
45
15
0
12 Nov 2024
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against
  LLM-driven Cyberattacks
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
Dario Pasquini
Evgenios M. Kornaropoulos
G. Ateniese
AAML
17
3
0
28 Oct 2024
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement
Antonis Antoniades
Albert Örwall
Kexun Zhang
Yuxi Xie
Anirudh Goyal
William Yang Wang
LLMAG
31
11
0
26 Oct 2024
Voice-Enabled AI Agents can Perform Common Scams
Voice-Enabled AI Agents can Perform Common Scams
Richard Fang
Dylan Bowman
Daniel Kang
11
0
0
21 Oct 2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Priyanshu Kumar
Elaine Lau
Saranya Vijayakumar
Tu Trinh
Scale Red Team
...
Sean Hendryx
Shuyan Zhou
Matt Fredrikson
Summer Yue
Zifan Wang
LLMAG
21
17
0
11 Oct 2024
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Simon Lermen
Mateusz Dziemian
Govind Pimpale
LLMAG
15
4
0
08 Oct 2024
The Role of Governments in Increasing Interconnected Post-Deployment
  Monitoring of AI
The Role of Governments in Increasing Interconnected Post-Deployment Monitoring of AI
Merlin Stein
Jamie Bernardi
Connor Dunlop
13
6
0
07 Oct 2024
AutoPenBench: Benchmarking Generative Agents for Penetration Testing
AutoPenBench: Benchmarking Generative Agents for Penetration Testing
Luca Gioacchini
Marco Mellia
Idilio Drago
Alexander Delsanto
G. Siracusano
Roberto Bifulco
ELM
21
5
0
04 Oct 2024
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered
  Applications are Vulnerable to PromptWares
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares
Stav Cohen
Ron Bitton
Ben Nassi
SILM
18
5
0
09 Aug 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Andis Draguns
Andrew Gritsevskiy
S. Motwani
Charlie Rogers-Smith
Jeffrey Ladish
Christian Schroeder de Witt
22
2
0
03 Jun 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software
  Engineering
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang
Carlos E. Jimenez
Alexander Wettig
K. Lieret
Shunyu Yao
Karthik Narasimhan
Ofir Press
LLMAG
96
36
0
06 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
24
36
0
06 May 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities
LLM Agents can Autonomously Exploit One-day Vulnerabilities
Richard Fang
R. Bindu
Akul Gupta
Daniel Kang
SILM
LLMAG
69
52
0
11 Apr 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated
  Large Language Model Agents
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
42
72
0
05 Mar 2024
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
A. Happe
Aaron Kaplan
Jürgen Cito
17
13
0
17 Oct 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1