ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.13161
  4. Cited By
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large
  Language Models

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

19 April 2024
Manish P Bhatt
Sahana Chennabasappa
Yue Li
Cyrus Nikolaidis
Daniel Song
Shengye Wan
Faizan Ahmad
Cornelius Aschermann
Yaohui Chen
Dhaval Kapil
David Molnar
Spencer Whitman
Joshua Saxe
    ELM
ArXivPDFHTML

Papers citing "CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models"

29 / 29 papers shown
Title
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Ivan Evtimov
Arman Zharmagambetov
Aaron Grattafiori
Chuan Guo
Kamalika Chaudhuri
AAML
30
0
0
22 Apr 2025
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
A. Happe
Jürgen Cito
17
0
0
14 Apr 2025
SandboxEval: Towards Securing Test Environment for Untrusted Code
SandboxEval: Towards Securing Test Environment for Untrusted Code
Rafiqul Rabin
Jesse Hostetler
Sean McGregor
Brett Weir
Nick Judd
ELM
31
0
0
27 Mar 2025
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
45
2
0
14 Mar 2025
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
AAML
ELM
31
1
0
03 Mar 2025
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
Michael Kouremetis
Marissa Dotter
Alex Byrne
Dan Martin
Ethan Michalak
Gianpaolo Russo
Michael Threet
Guido Zarrella
ELM
43
4
0
18 Feb 2025
A Contemporary Survey of Large Language Model Assisted Program Analysis
A Contemporary Survey of Large Language Model Assisted Program Analysis
Jiayimei Wang
Tao Ni
Wei-Bin Lee
Qingchuan Zhao
36
5
0
05 Feb 2025
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations
Ziyang Ye
T. H. Le
Muhammad Ali Babar
76
0
0
04 Feb 2025
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
Jinjun Peng
Leyi Cui
Kele Huang
Junfeng Yang
Baishakhi Ray
ELM
52
6
0
14 Jan 2025
Evaluating and Improving the Robustness of Security Attack Detectors
  Generated by LLMs
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs
Samuele Pasini
Jinhan Kim
Tommaso Aiello
Rocío Cabrera Lozoya
Antonino Sabetta
Paolo Tonella
60
0
0
27 Nov 2024
What AI evaluations for preventing catastrophic risks can and cannot do
What AI evaluations for preventing catastrophic risks can and cannot do
Peter Barnett
Lisa Thiergart
ELM
64
1
0
26 Nov 2024
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A
  Comparative Analysis
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
Jonathan Brokman
Omer Hofman
Oren Rachmil
Inderjeet Singh
Vikas Pahuja
Rathina Sabapathy Aishvariya Priya
Amit Giloni
Roman Vainshtein
Hisashi Kojima
24
1
0
21 Oct 2024
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI
  with a Focus on Model Confidence
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
B. Borsos
...
Ryan Marinelli
Lucas C. Cordeiro
Merouane Debbah
Vasileios Mavroeidis
Audun Josang
13
4
0
20 Oct 2024
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
Yu Yang
Yuzhou Nie
Zhun Wang
Yuheng Tang
Wenbo Guo
Bo Li
D. Song
ELM
38
6
0
14 Oct 2024
Are You Human? An Adversarial Benchmark to Expose LLMs
Are You Human? An Adversarial Benchmark to Expose LLMs
Gilad Gressel
Rahul Pankajakshan
Yisroel Mirsky
DeLMO
28
0
0
12 Oct 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM
  Agent Cyber Offense Capabilities
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Andrey Anurin
Jonathan Ng
Kibo Schaffer
Jason Schreiber
Esben Kran
ELM
19
5
0
10 Oct 2024
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova
Erik Brinkman
Krithika Iyer
Vítor Albiero
Joanna Bitton
Hailey Nguyen
J. Li
Cristian Canton Ferrer
Ivan Evtimov
Aaron Grattafiori
ALM
15
6
0
02 Oct 2024
Efficient Federated Intrusion Detection in 5G ecosystem using optimized
  BERT-based model
Efficient Federated Intrusion Detection in 5G ecosystem using optimized BERT-based model
Frederic Adjewa
Moez Esseghir
Leila Merghem-Boulahia
22
4
0
28 Sep 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in
  Red Teaming GenAI
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat
Stefan Schoepf
Giulio Zizzo
Giandomenico Cornacchia
Muhammad Zaid Hameed
...
Elizabeth M. Daly
Mark Purcell
P. Sattigeri
Pin-Yu Chen
Kush R. Varshney
AAML
34
6
0
23 Sep 2024
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical
  Researcher
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher
Derry Pratama
Naufal Suryanto
Andro Aprila Adiputra
Thi-Thu-Huong Le
Ahmada Yusril Kadiptya
Muhammad Iqbal
Howon Kim
21
2
0
21 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
18
11
0
31 Jul 2024
eyeballvul: a future-proof benchmark for vulnerability detection in the
  wild
eyeballvul: a future-proof benchmark for vulnerability detection in the wild
Timothee Chauvin
16
5
0
11 Jul 2024
Badllama 3: removing safety finetuning from Llama 3 in minutes
Badllama 3: removing safety finetuning from Llama 3 in minutes
Dmitrii Volkov
16
3
0
01 Jul 2024
INDICT: Code Generation with Internal Dialogues of Critiques for Both
  Security and Helpfulness
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
Hung Le
Yingbo Zhou
Caiming Xiong
Silvio Savarese
Doyen Sahoo
30
2
0
23 Jun 2024
Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent
  Cybersecurity
Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity
Tam n. Nguyen
ELM
29
1
0
11 Jun 2024
A Misleading Gallery of Fluid Motion by Generative Artificial
  Intelligence
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
Ali Kashefi
VGen
25
5
0
24 May 2024
InfiBench: Evaluating the Question-Answering Capabilities of Code Large
  Language Models
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
Linyi Li
Shijie Geng
Zhenwen Li
Yibo He
Hao Yu
Ziyue Hua
Guanghan Ning
Siwei Wang
Tao Xie
Hongxia Yang
ELM
18
0
0
11 Mar 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
59
61
0
26 Feb 2024
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
Yuqiang Sun
Daoyuan Wu
Yue Xue
Han Liu
Wei Ma
Lyuye Zhang
Miaolei Shi
Yingjiu Li
ELM
76
46
0
29 Jan 2024
1