ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.06664
  4. Cited By
LLM Agents can Autonomously Hack Websites

LLM Agents can Autonomously Hack Websites

6 February 2024
Richard Fang
R. Bindu
Akul Gupta
Qiusi Zhan
Daniel Kang
    LLMAG
ArXivPDFHTML

Papers citing "LLM Agents can Autonomously Hack Websites"

33 / 33 papers shown
Title
RedTeamLLM: an Agentic AI framework for offensive security
RedTeamLLM: an Agentic AI framework for offensive security
Brian Challita
Pierre Parrend
LLMAG
40
0
0
11 May 2025
From Texts to Shields: Convergence of Large Language Models and Cybersecurity
From Texts to Shields: Convergence of Large Language Models and Cybersecurity
Tao Li
Ya-Ting Yang
Yunian Pan
Quanyan Zhu
31
0
0
01 May 2025
ARCeR: an Agentic RAG for the Automated Definition of Cyber Ranges
ARCeR: an Agentic RAG for the Automated Definition of Cyber Ranges
Matteo Lupinacci
Francesco Blefari
Francesco Romeo
Francesco Aurelio Pironti
Angelo Furfaro
26
0
0
16 Apr 2025
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Yuxuan Zhu
Antony Kellermann
Dylan Bowman
Philip Li
Akul Gupta
...
Avi Dhir
Sudhit Rao
Kaicheng Yu
Twm Stone
Daniel Kang
LLMAG
ELM
72
2
0
21 Mar 2025
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Arman Zharmagambetov
Chuan Guo
Ivan Evtimov
Maya Pavlova
Ruslan Salakhutdinov
Kamalika Chaudhuri
61
1
0
12 Mar 2025
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
AAML
ELM
39
1
0
03 Mar 2025
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing
Masaya Kobayashi
Masane Fuchi
Amar Zanashir
Tomonori Yoneda
Tomohiro Takagi
LLMAG
42
1
0
24 Feb 2025
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
I. Isozaki
Manil Shrestha
Rick Console
Edward Kim
ELM
65
4
0
24 Feb 2025
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
Simeon Campos
Henry Papadatos
Fabien Roger
Chloé Touzet
Malcolm Murray
Otter Quarks
76
2
0
20 Feb 2025
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
Michael Kouremetis
Marissa Dotter
Alex Byrne
Dan Martin
Ethan Michalak
Gianpaolo Russo
Michael Threet
Guido Zarrella
ELM
50
5
0
18 Feb 2025
The AI Agent Index
The AI Agent Index
Stephen Casper
Luke Bailey
Rosco Hunter
Carson Ezell
Emma Cabalé
...
Phillip J. K. Christoffersen
A. Pinar Ozisik
Rakshit Trivedi
Dylan Hadfield-Menell
Noam Kolt
68
4
0
03 Feb 2025
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration
  Testing
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
Lajos Muzsai
David Imolai
András Lukács
LLMAG
67
6
0
02 Dec 2024
Evaluating Large Language Models' Capability to Launch Fully Automated
  Spear Phishing Campaigns: Validated on Human Subjects
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects
Fred Heiding
Simon Lermen
Andrew Kao
B. Schneier
A. Vishwanath
66
6
0
30 Nov 2024
What AI evaluations for preventing catastrophic risks can and cannot do
What AI evaluations for preventing catastrophic risks can and cannot do
Peter Barnett
Lisa Thiergart
ELM
76
2
0
26 Nov 2024
Voice-Enabled AI Agents can Perform Common Scams
Voice-Enabled AI Agents can Perform Common Scams
Richard Fang
Dylan Bowman
Daniel Kang
21
0
0
21 Oct 2024
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks
Daniel Ayzenshteyn
Roy Weiss
Yisroel Mirsky
AAML
16
0
0
20 Oct 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM
  Agent Cyber Offense Capabilities
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Andrey Anurin
Jonathan Ng
Kibo Schaffer
Jason Schreiber
Esben Kran
ELM
27
5
0
10 Oct 2024
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Simon Lermen
Mateusz Dziemian
Govind Pimpale
LLMAG
18
4
0
08 Oct 2024
Exploring LLMs for Malware Detection: Review, Framework Design, and
  Countermeasure Approaches
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches
Jamal N. Al-Karaki
Muhammad Al-Zafar Khan
Marwan Omar
26
4
0
11 Sep 2024
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical
  Researcher
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher
Derry Pratama
Naufal Suryanto
Andro Aprila Adiputra
Thi-Thu-Huong Le
Ahmada Yusril Kadiptya
Muhammad Iqbal
Howon Kim
32
5
0
21 Aug 2024
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered
  Applications are Vulnerable to PromptWares
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares
Stav Cohen
Ron Bitton
Ben Nassi
SILM
33
5
0
09 Aug 2024
Hierarchical Context Pruning: Optimizing Real-World Code Completion with
  Repository-Level Pretrained Code LLMs
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
Lei Zhang
Yunshui Li
Jiaming Li
Xiaobo Xia
Jiaxi Yang
Run Luo
Minzheng Wang
Longze Chen
Junhao Liu
Min Yang
27
1
0
26 Jun 2024
Adversaries Can Misuse Combinations of Safe Models
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
Anca Dragan
Jacob Steinhardt
40
6
0
20 Jun 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Richard Fang
Antony Kellermann
Akul Gupta
Qiusi Zhan
Richard Fang
R. Bindu
Daniel Kang
LLMAG
33
27
0
02 Jun 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
24
36
0
06 May 2024
Can LLMs Understand Computer Networks? Towards a Virtual System
  Administrator
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator
Denis Donadel
Francesco Marchiori
Luca Pajola
Mauro Conti
27
7
0
19 Apr 2024
A Safe Harbor for AI Evaluation and Red Teaming
A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre
Sayash Kapoor
Kevin Klyman
Ashwin Ramaswami
Rishi Bommasani
...
Daniel Kang
Sandy Pentland
Arvind Narayanan
Percy Liang
Peter Henderson
49
37
0
07 Mar 2024
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
15
80
0
31 Oct 2023
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
A. Happe
Aaron Kaplan
Jürgen Cito
19
15
0
17 Oct 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
125
137
0
19 Sep 2023
Emergent autonomous scientific research capabilities of large language
  models
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko
R. MacKnight
Gabe Gomes
ELM
LM&Ro
AI4CE
LLMAG
101
115
0
11 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,413
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1