ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.23326
33
4

MassSpecGym: A benchmark for the discovery and identification of molecules

17 February 2025
Roman Bushuiev
Anton Bushuiev
Niek F. de Jonge
A. Young
Fleming Kretschmer
Raman Samusevich
Janne Heirman
Fei Wang
L. Zhang
Kai Dührkop
Marcus Ludwig
Nils A. Haupt
A. Kalia
Corinna Brungs
Robin Schmid
Russell Greiner
Bo Wang
D. Wishart
Li Liu
Juho Rousu
Wout Bittremieux
Hannes L. Röst
Tytus D. Mak
S. Hassoun
Florian Huber
Justin J. J. van der Hooft
Michael A. Stravs
Sebastian Böcker
Josef Sivic
Tomáš Pluskal
ArXivPDFHTML
Abstract

The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: de novo molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available atthis https URL.

View on arXiv
@article{bushuiev2025_2410.23326,
  title={ MassSpecGym: A benchmark for the discovery and identification of molecules },
  author={ Roman Bushuiev and Anton Bushuiev and Niek F. de Jonge and Adamo Young and Fleming Kretschmer and Raman Samusevich and Janne Heirman and Fei Wang and Luke Zhang and Kai Dührkop and Marcus Ludwig and Nils A. Haupt and Apurva Kalia and Corinna Brungs and Robin Schmid and Russell Greiner and Bo Wang and David S. Wishart and Li-Ping Liu and Juho Rousu and Wout Bittremieux and Hannes Rost and Tytus D. Mak and Soha Hassoun and Florian Huber and Justin J.J. van der Hooft and Michael A. Stravs and Sebastian Böcker and Josef Sivic and Tomáš Pluskal },
  journal={arXiv preprint arXiv:2410.23326},
  year={ 2025 }
}
Comments on this paper