TOSSS: a CVE-based Software Security Benchmark for Large Language Models

11 March 2026

Marc Damie

Murat Bilgehan Ertan

Domenico Essoussi

Angela Makhanu

Gaëtan Peter

Roos Wensveen

ELM

ArXiv (abs)PDF HTML Github

Main:6 Pages

5 Figures

Bibliography:1 Pages

2 Tables

Abstract

With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are LLMs good at software security? At the same time, organizations worldwide invest heavily in cybersecurity to reduce exposure to disruptive attacks. The integration of LLMs into software engineering workflows may introduce new vulnerabilities and weaken existing security efforts.

View on arXiv

Comments on this paper