Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.07688
Cited By
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge
12 February 2024
Norbert Tihanyi
M. Ferrag
Ridhi Jain
Tamás Bisztray
Merouane Debbah
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge"
13 / 13 papers shown
Title
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Paul Kassianik
Baturay Saglam
Alexander Chen
Blaine Nelson
Anu Vellore
...
Hyrum Anderson
Kojin Oshiba
Omar Santos
Yaron Singer
Amin Karbasi
PILM
56
0
0
28 Apr 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Shuang Tian
Tao Zhang
J. Liu
Jiacheng Wang
Xuangou Wu
...
Ruichen Zhang
W. Zhang
Zhenhui Yuan
Shiwen Mao
Dong In Kim
48
0
0
22 Apr 2025
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
Jaime Raldua Veuthey
Zainab Ali Majid
Suhas Hariharan
Jacob Haimes
ELM
26
0
0
18 Apr 2025
The Digital Cybersecurity Expert: How Far Have We Come?
Dawei Wang
Geng Zhou
Xianglong Li
Yu Bai
Li Chen
Ting Qin
Jian Sun
D. Li
ELM
57
0
0
16 Apr 2025
Large Language Models are Unreliable for Cyber Threat Intelligence
Emanuele Mezzi
Fabio Massacci
Katja Tuma
31
0
0
29 Mar 2025
CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data
Adel ElZemity
Budi Arief
Shujun Li
54
0
0
12 Mar 2025
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks
Javier Yong
Haokai Ma
Yunshan Ma
Anis Yusof
Zhenkai Liang
E. Chang
52
0
0
05 Mar 2025
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
Michael Kouremetis
Marissa Dotter
Alex Byrne
Dan Martin
Ethan Michalak
Gianpaolo Russo
Michael Threet
Guido Zarrella
ELM
50
4
0
18 Feb 2025
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
Pengfei Jing
Mengyun Tang
Xiaorong Shi
Xing Zheng
Sen Nie
Shi Wu
Yong Yang
Xiapu Luo
ELM
43
1
0
30 Dec 2024
Multi-Agent Collaboration in Incident Response with Large Language Models
Zefang Liu
LLMAG
AI4CE
71
0
0
01 Dec 2024
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
B. Borsos
...
Ryan Marinelli
Lucas C. Cordeiro
Merouane Debbah
Vasileios Mavroeidis
Audun Josang
16
4
0
20 Oct 2024
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models
Fatma Yasmine Loumachi
Mohamed Chahine Ghanem
AI4CE
36
1
0
04 Sep 2024
Aligning Offline Metrics and Human Judgments of Value for Code Generation Models
Victor C. Dibia
Adam Fourney
Gagan Bansal
Forough Poursabzi-Sangdeh
Han Liu
Saleema Amershi
ALM
OffRL
33
12
0
29 Oct 2022
1