4

NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers

Liujia Yang
Zhuo Yang
Jiaqing Xie
Yubin Wang
Ben Gao
Tianfan Fu
Xingjian Wei
Jiaxing Sun
Jiang Wu
Conghui He
Yuqiang Li
Qinying Gu
Main:14 Pages
12 Figures
Bibliography:2 Pages
7 Tables
Appendix:1 Pages
Abstract

Nuclear Magnetic Resonance (NMR) spectroscopy is fundamental for molecular structure elucidation, yet interpreting spectra at scale remains time-consuming and highly expertise-dependent. While recent spectrum-as-language modeling and retrieval-based methods have shown promise, they rely heavily on large corpora of computed spectra and exhibit notable performance drops when applied to experimental measurements. To address these issues, we build NMRSpec, a large-scale corpus of experimental 1^1H and 13^{13}C spectra mined from chemical literature, and propose NMRTrans, which models spectra as unordered peak sets and aligns the model's inductive bias with the physical nature of NMR. To our best knowledge, NMRTrans is the first NMR Transformer trained solely on large-scale experimental spectra and achieves state-of-the-art performance on experimental benchmarks, improving Top-10 Accuracy over the strongest baseline by +17.82 points (61.15% vs. 43.33%), and underscoring the importance of experimental data and structure-aware architectures for reliable NMR structure elucidation.

View on arXiv
Comments on this paper