122

ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset

Doğa Elitez
Paul Gessinger
Daniel Murnane
Marcus Selchou Raaholt
Andreas Salzburger
Stine Kofoed Skov
Andreas Stefl
Anna Zaborowska
Main:14 Pages
16 Figures
Bibliography:6 Pages
10 Tables
Appendix:8 Pages
Abstract

We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions (s=14\sqrt{s}=14 TeV, mean pile-up μ=200\mu = 200). ColliderML provides one million events across ten Standard Model and Beyond Standard Model processes, plus extensive single-particle samples, all produced with modern next-to-leading order matrix element calculation and showering, realistic per-event pile-up overlay, a validated OpenDataDetector geometry, and standard reconstructions. The release fills a major gap for machine learning (ML) research on detector-level data, provided on the ML-friendly Hugging Face platform. We present physics coverage and the generation, simulation, digitisation and reconstruction pipeline, describe format and access, and initial collider physics benchmarks.

View on arXiv
Comments on this paper